CudaExtension#

Extension for CUDA operations.

  • UUID: d63a98fa-7882-11eb-a917-b38f664f399c

  • Version: 2.6.0

  • Author: NVIDIA

  • License: LICENSE

Components#

nvidia::gxf::CudaStream#

Holds and provides access to native cudaStream_t.

nvidia::gxf::CudaStream handle must be allocated by nvidia::gxf::CudaStreamPool. Its lifecycle is valid until explicitly recycled through nvidia::gxf::CudaStreamPool.releaseStream() or implicitly until nvidia::gxf::CudaStreamPool is deactivated.

You may call stream() to get the native cudaStream_t handle, and to submit GPU operations. After the submission, GPU takes over the input tensors/buffers and keeps them in use. To prevent host carelessly releasing these in-use buffers, CUDA Codelet needs to call record(event, input_entity, sync_cb) to extend input_entity’s lifecycle until GPU completely consumes it. Alternatively, you may call record(event, event_destroy_cb) for native cudaEvent_t operations and free in-use resource via event_destroy_cb.

It is required to have a nvidia::gxf::CudaStreamSync in the graph pipeline after all the CUDA operations. See more details in nvidia::gxf::CudaStreamSync

  • Component ID: 5683d692-7884-11eb-9338-c3be62d576be

  • Defined in: gxf/cuda/cuda_stream.hpp

nvidia::gxf::CudaStreamId#

Holds CUDA stream Id to deduce nvidia::gxf::CudaStream handle.

stream_cid should be nvidia::gxf::CudaStream component id.

  • Component ID: 7982aeac-37f1-41be-ade8-6f00b4b5d47c

  • Defined in: gxf/cuda/cuda_stream_id.hpp

nvidia::gxf::CudaEvent#

Holds and provides access to native cudaEvent_t handle.

When a nvidia::gxf::CudaEvent is created, you’ll need to initialize a native cudaEvent_t through init(flags, dev_id), or set third party event through initWithEvent(event, dev_id, free_fnc). The event keeps valid until deinit is called explicitly otherwise gets recycled in destructor.

  • Component ID: f5388d5c-a709-47e7-86c4-171779bc64f3

  • Defined in: gxf/cuda/cuda_event.hpp

nvidia::gxf::CudaStreamPool#

CudaStream allocation.

You must explicitly call allocateStream() to get a valid nvidia::gxf::CudaStream handle. This component would hold all the its allocated nvidia::gxf::CudaStream entities until releaseStream(stream) is called explicitly or the CudaStreamPool component is deactivated.

  • Component ID: 6733bf8b-ba5e-4fae-b596-af2d1269d0e7

  • Base Type: nvidia::gxf::Allocator

Parameters#

dev_id

GPU device id.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32

  • Default Value: 0


stream_flags

Flag values to create CUDA streams.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32

  • Default Value: 0


stream_priority

Priority values to create CUDA streams.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32

  • Default Value: 0


reserved_size

User-specified file name without extension.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32

  • Default Value: 1

max_size

Maximum Stream Size.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32

  • Default Value: 0, no limitation.

nvidia::gxf::CudaStreamSync#

Synchronize all CUDA streams which are carried by message entities.

This codelet is required to get connected in the graph pipeline after all CUDA ops codelets. When a message entity is received, it would find all of the nvidia::gxf::CudaStreamId in that message, and extract out each nvidia::gxf::CudaStream. With each CudaStream handle, it synchronizes all previous nvidia::gxf::CudaStream.record() events, along with all submitted GPU operations before this point.

Note

CudaStreamSync must be set in the graph when nvidia::gxf::CudaStream.record() is used, otherwise it may cause memory leak.

  • Component ID: 0d1d8142-6648-485d-97d5-277eed00129c

  • Base Type: nvidia::gxf::Codelet

Parameters#

rx

Receiver to receive all messages carrying nvidia::gxf::CudaStreamId.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::Receiver


tx

Transmitter to send messages to downstream.

  • Flags: GXF_PARAMETER_FLAGS_OPTIONAL

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::Transmitter

nvidia::gxf::CudaAllocator#

Base class for RMMAllocator and StreamOrderedAllocator/

  • Component ID: cac15f93-6438-4ed3-bc61-b5dba25b6f91

  • Base Type: nvidia::gxf::Allocator

  • Defined in: gxf/cuda/cuda_allocator.hpp

nvidia::gxf::StreamOrderedAllocator#

Memory Allocator with stream order memory allocation on GPU device.

  • Component ID: 63d1d168-13d7-11ef-931a-0be4a6378384

  • Base Type: nvidia::gxf::CudaAllocator

Parameters#

gpu_device

GPU device resource from which is used allocate CUDA memory.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::GPUDevice

device_memory_initial_size

The initial memory pool size used by device memory resource. The size is specified as a string containing a number and an (optional) unit. If no unit is given the value is assumed to be in bytes. Supported units are: B, KB, MB, GB, TB.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_STRING

  • Default: “16MB” (non-Jetson machine) or “8MB” (Jetson machine)


device_memory_max_size

The maximum memory pool size used by device memory resource. The size is specified as a string containing a number and an (optional) unit. If no unit is given the value is assumed to be in bytes. Supported units are: B, KB, MB, GB, TB.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_STRING

  • Default: “32MB” (non-Jetson machine) or “16MB” (Jetson machine)


release_threshold

The release threshold specifies the maximum amount of memory the pool caches. The size is specified as a string containing a number and an (optional) unit. If no unit is given the value is assumed to be in bytes. Supported units are: B, KB, MB, GB, TB.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_STRING

  • Default: 0