6.15. Stream Ordered Memory Allocator
This section describes the stream ordered memory allocator exposed by the low-level CUDA driver application programming interface.
overview
The asynchronous allocator allows the user to allocate and free in stream order. All asynchronous accesses of the allocation must happen between the stream executions of the allocation and the free. If the memory is accessed outside of the promised stream order, a use before allocation / use after free error will cause undefined behavior.
The allocator is free to reallocate the memory as long as it can guarantee that compliant memory accesses will not overlap temporally. The allocator may refer to internal stream ordering as well as inter-stream dependencies (such as CUDA events and null stream dependencies) when establishing the temporal guarantee. The allocator may also insert inter-stream dependencies to establish the temporal guarantee.
Supported Platforms
Whether or not a device supports the integrated stream ordered memory allocator may be queried by calling cuDeviceGetAttribute() with the device attribute CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED
Functions
- CUresult cuMemAllocAsync ( CUdeviceptr* dptr, size_t bytesize, CUstream hStream )
- Allocates memory with stream ordered semantics.
- CUresult cuMemAllocFromPoolAsync ( CUdeviceptr* dptr, size_t bytesize, CUmemoryPool pool, CUstream hStream )
- Allocates memory from a specified pool with stream ordered semantics.
- CUresult cuMemFreeAsync ( CUdeviceptr dptr, CUstream hStream )
- Frees memory with stream ordered semantics.
- CUresult cuMemPoolCreate ( CUmemoryPool* pool, const CUmemPoolProps* poolProps )
- Creates a memory pool.
- CUresult cuMemPoolDestroy ( CUmemoryPool pool )
- Destroys the specified memory pool.
- CUresult cuMemPoolExportPointer ( CUmemPoolPtrExportData* shareData_out, CUdeviceptr ptr )
- Export data to share a memory pool allocation between processes.
- CUresult cuMemPoolExportToShareableHandle ( void* handle_out, CUmemoryPool pool, CUmemAllocationHandleType handleType, unsigned long long flags )
- Exports a memory pool to the requested handle type.
- CUresult cuMemPoolGetAccess ( CUmemAccess_flags* flags, CUmemoryPool memPool, CUmemLocation* location )
- Returns the accessibility of a pool from a device.
- CUresult cuMemPoolGetAttribute ( CUmemoryPool pool, CUmemPool_attribute attr, void* value )
- Gets attributes of a memory pool.
- CUresult cuMemPoolImportFromShareableHandle ( CUmemoryPool* pool_out, void* handle, CUmemAllocationHandleType handleType, unsigned long long flags )
- imports a memory pool from a shared handle.
- CUresult cuMemPoolImportPointer ( CUdeviceptr* ptr_out, CUmemoryPool pool, CUmemPoolPtrExportData* shareData )
- Import a memory pool allocation from another process.
- CUresult cuMemPoolSetAccess ( CUmemoryPool pool, const CUmemAccessDesc* map, size_t count )
- Controls visibility of pools between devices.
- CUresult cuMemPoolSetAttribute ( CUmemoryPool pool, CUmemPool_attribute attr, void* value )
- Sets attributes of a memory pool.
- CUresult cuMemPoolTrimTo ( CUmemoryPool pool, size_t minBytesToKeep )
- Tries to release memory back to the OS.
Functions
- CUresult cuMemAllocAsync ( CUdeviceptr* dptr, size_t bytesize, CUstream hStream )
-
Allocates memory with stream ordered semantics.
Parameters
- dptr
- - Returned device pointer
- bytesize
- - Number of bytes to allocate
- hStream
- - The stream establishing the stream ordering contract and the memory pool to allocate from
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT (default stream specified with no current context), CUDA_ERROR_NOT_SUPPORTED, CUDA_ERROR_OUT_OF_MEMORY
Description
Inserts an allocation operation into hStream. A pointer to the allocated memory is returned immediately in *dptr. The allocation must not be accessed until the the allocation operation completes. The allocation comes from the memory pool current to the stream's device.
Note:-
The default memory pool of a device contains device memory from that device.
-
Basic stream ordering allows future work submitted into the same stream to use the allocation. Stream query, stream synchronize, and CUDA events can be used to guarantee that the allocation operation completes before work submitted in a separate stream runs.
-
During stream capture, this function results in the creation of an allocation node. In this case, the allocation is owned by the graph instead of the memory pool. The memory pool's properties are used to set the node's creation parameters.
See also:
cuMemAllocFromPoolAsync, cuMemFreeAsync, cuDeviceSetMemPool, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate, cuMemPoolSetAccess, cuMemPoolSetAttribute
- CUresult cuMemAllocFromPoolAsync ( CUdeviceptr* dptr, size_t bytesize, CUmemoryPool pool, CUstream hStream )
-
Allocates memory from a specified pool with stream ordered semantics.
Parameters
- dptr
- - Returned device pointer
- bytesize
- - Number of bytes to allocate
- pool
- - The pool to allocate from
- hStream
- - The stream establishing the stream ordering semantic
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT (default stream specified with no current context), CUDA_ERROR_NOT_SUPPORTED, CUDA_ERROR_OUT_OF_MEMORY
Description
Inserts an allocation operation into hStream. A pointer to the allocated memory is returned immediately in *dptr. The allocation must not be accessed until the the allocation operation completes. The allocation comes from the specified memory pool.
Note:-
The specified memory pool may be from a device different than that of the specified hStream.
-
Basic stream ordering allows future work submitted into the same stream to use the allocation. Stream query, stream synchronize, and CUDA events can be used to guarantee that the allocation operation completes before work submitted in a separate stream runs.
Note:During stream capture, this function results in the creation of an allocation node. In this case, the allocation is owned by the graph instead of the memory pool. The memory pool's properties are used to set the node's creation parameters.
See also:
cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate, cuMemPoolSetAccess, cuMemPoolSetAttribute
- CUresult cuMemFreeAsync ( CUdeviceptr dptr, CUstream hStream )
-
Frees memory with stream ordered semantics.
Parameters
- dptr
- - memory to free
- hStream
- - The stream establishing the stream ordering contract.
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT (default stream specified with no current context), CUDA_ERROR_NOT_SUPPORTED
Description
Inserts a free operation into hStream. The allocation must not be accessed after stream execution reaches the free. After this API returns, accessing the memory from any subsequent work launched on the GPU or querying its pointer attributes results in undefined behavior.
Note:During stream capture, this function results in the creation of a free node and must therefore be passed the address of a graph allocation.
- CUresult cuMemPoolCreate ( CUmemoryPool* pool, const CUmemPoolProps* poolProps )
-
Creates a memory pool.
Returns
CUDA_SUCCESS, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY, CUDA_ERROR_NOT_PERMITTEDCUDA_ERROR_NOT_SUPPORTED
Description
Creates a CUDA memory pool and returns the handle in pool. The poolProps determines the properties of the pool such as the backing device and IPC capabilities.
To create a memory pool targeting a specific host NUMA node, applications must set CUmemPoolProps::CUmemLocation::type to CU_MEM_LOCATION_TYPE_HOST_NUMA and CUmemPoolProps::CUmemLocation::id must specify the NUMA ID of the host memory node. Specifying CU_MEM_LOCATION_TYPE_HOST_NUMA_CURRENT or CU_MEM_LOCATION_TYPE_HOST as the CUmemPoolProps::CUmemLocation::type will result in CUDA_ERROR_INVALID_VALUE. By default, the pool's memory will be accessible from the device it is allocated on. In the case of pools created with CU_MEM_LOCATION_TYPE_HOST_NUMA, their default accessibility will be from the host CPU. Applications can control the maximum size of the pool by specifying a non-zero value for CUmemPoolProps::maxSize. If set to 0, the maximum size of the pool will default to a system dependent value.
Applications can set CUmemPoolProps::handleTypes to CU_MEM_HANDLE_TYPE_FABRIC in order to create CUmemoryPool suitable for sharing within an IMEX domain. An IMEX domain is either an OS instance or a group of securely connected OS instances using the NVIDIA IMEX daemon. An IMEX channel is a global resource within the IMEX domain that represents a logical entity that aims to provide fine grained accessibility control for the participating processes. When exporter and importer CUDA processes have been granted access to the same IMEX channel, they can securely share memory. If the allocating process does not have access setup for an IMEX channel, attempting to export a CUmemoryPool with CU_MEM_HANDLE_TYPE_FABRIC will result in CUDA_ERROR_NOT_PERMITTED. The nvidia-modprobe CLI provides more information regarding setting up of IMEX channels.
Note:Specifying CU_MEM_HANDLE_TYPE_NONE creates a memory pool that will not support IPC.
See also:
cuDeviceSetMemPool, cuDeviceGetMemPool, cuDeviceGetDefaultMemPool, cuMemAllocFromPoolAsync, cuMemPoolExportToShareableHandle
- CUresult cuMemPoolDestroy ( CUmemoryPool pool )
-
Destroys the specified memory pool.
Returns
Description
If any pointers obtained from this pool haven't been freed or the pool has free operations that haven't completed when cuMemPoolDestroy is invoked, the function will return immediately and the resources associated with the pool will be released automatically once there are no more outstanding allocations.
Destroying the current mempool of a device sets the default mempool of that device as the current mempool for that device.
Note:A device's default memory pool cannot be destroyed.
See also:
cuMemFreeAsync, cuDeviceSetMemPool, cuDeviceGetMemPool, cuDeviceGetDefaultMemPool, cuMemPoolCreate
- CUresult cuMemPoolExportPointer ( CUmemPoolPtrExportData* shareData_out, CUdeviceptr ptr )
-
Export data to share a memory pool allocation between processes.
Parameters
- shareData_out
- - Returned export data
- ptr
- - pointer to memory being exported
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_OUT_OF_MEMORY
Description
Constructs shareData_out for sharing a specific allocation from an already shared memory pool. The recipient process can import the allocation with the cuMemPoolImportPointer api. The data is not a handle and may be shared through any IPC mechanism.
See also:
cuMemPoolExportToShareableHandle, cuMemPoolImportFromShareableHandle, cuMemPoolImportPointer
- CUresult cuMemPoolExportToShareableHandle ( void* handle_out, CUmemoryPool pool, CUmemAllocationHandleType handleType, unsigned long long flags )
-
Exports a memory pool to the requested handle type.
Parameters
- handle_out
- - Returned OS handle
- pool
- - pool to export
- handleType
- - the type of handle to create
- flags
- - must be 0
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_OUT_OF_MEMORY
Description
Given an IPC capable mempool, create an OS handle to share the pool with another process. A recipient process can convert the shareable handle into a mempool with cuMemPoolImportFromShareableHandle. Individual pointers can then be shared with the cuMemPoolExportPointer and cuMemPoolImportPointer APIs. The implementation of what the shareable handle is and how it can be transferred is defined by the requested handle type.
Note:: To create an IPC capable mempool, create a mempool with a CUmemAllocationHandleType other than CU_MEM_HANDLE_TYPE_NONE.
See also:
cuMemPoolImportFromShareableHandle, cuMemPoolExportPointer, cuMemPoolImportPointer, cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate, cuMemPoolSetAccess, cuMemPoolSetAttribute
- CUresult cuMemPoolGetAccess ( CUmemAccess_flags* flags, CUmemoryPool memPool, CUmemLocation* location )
-
Returns the accessibility of a pool from a device.
Parameters
- flags
- - the accessibility of the pool from the specified location
- memPool
- - the pool being queried
- location
- - the location accessing the pool
Description
Returns the accessibility of the pool's memory from the specified location.
See also:
cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate
- CUresult cuMemPoolGetAttribute ( CUmemoryPool pool, CUmemPool_attribute attr, void* value )
-
Gets attributes of a memory pool.
Parameters
- pool
- - The memory pool to get attributes of
- attr
- - The attribute to get
- value
- - Retrieved value
Description
Supported attributes are:
-
CU_MEMPOOL_ATTR_RELEASE_THRESHOLD: (value type = cuuint64_t) Amount of reserved memory in bytes to hold onto before trying to release memory back to the OS. When more than the release threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the OS on the next call to stream, event or context synchronize. (default 0)
-
CU_MEMPOOL_ATTR_REUSE_FOLLOW_EVENT_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to use memory asynchronously freed in another stream as long as a stream ordering dependency of the allocating stream on the free action exists. Cuda events and null stream interactions can create the required stream ordered dependencies. (default enabled)
-
CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC: (value type = int) Allow reuse of already completed frees when there is no dependency between the free and allocation. (default enabled)
-
CU_MEMPOOL_ATTR_REUSE_ALLOW_INTERNAL_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to insert new stream dependencies in order to establish the stream ordering required to reuse a piece of memory released by cuMemFreeAsync (default enabled).
-
CU_MEMPOOL_ATTR_RESERVED_MEM_CURRENT: (value type = cuuint64_t) Amount of backing memory currently allocated for the mempool
-
CU_MEMPOOL_ATTR_RESERVED_MEM_HIGH: (value type = cuuint64_t) High watermark of backing memory allocated for the mempool since the last time it was reset.
-
CU_MEMPOOL_ATTR_USED_MEM_CURRENT: (value type = cuuint64_t) Amount of memory from the pool that is currently in use by the application.
-
CU_MEMPOOL_ATTR_USED_MEM_HIGH: (value type = cuuint64_t) High watermark of the amount of memory from the pool that was in use by the application.
See also:
cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate
- CUresult cuMemPoolImportFromShareableHandle ( CUmemoryPool* pool_out, void* handle, CUmemAllocationHandleType handleType, unsigned long long flags )
-
imports a memory pool from a shared handle.
Parameters
- pool_out
- - Returned memory pool
- handle
- - OS handle of the pool to open
- handleType
- - The type of handle being imported
- flags
- - must be 0
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_OUT_OF_MEMORY
Description
Specific allocations can be imported from the imported pool with cuMemPoolImportPointer.
If handleType is CU_MEM_HANDLE_TYPE_FABRIC and the importer process has not been granted access to the same IMEX channel as the exporter process, this API will error as CUDA_ERROR_NOT_PERMITTED.
Note:Imported memory pools do not support creating new allocations. As such imported memory pools may not be used in cuDeviceSetMemPool or cuMemAllocFromPoolAsync calls.
See also:
cuMemPoolExportToShareableHandle, cuMemPoolExportPointer, cuMemPoolImportPointer
- CUresult cuMemPoolImportPointer ( CUdeviceptr* ptr_out, CUmemoryPool pool, CUmemPoolPtrExportData* shareData )
-
Import a memory pool allocation from another process.
Parameters
- ptr_out
- - pointer to imported memory
- pool
- - pool from which to import
- shareData
- - data specifying the memory to import
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_OUT_OF_MEMORY
Description
Returns in ptr_out a pointer to the imported memory. The imported memory must not be accessed before the allocation operation completes in the exporting process. The imported memory must be freed from all importing processes before being freed in the exporting process. The pointer may be freed with cuMemFree or cuMemFreeAsync. If cuMemFreeAsync is used, the free must be completed on the importing process before the free operation on the exporting process.
Note:The cuMemFreeAsync api may be used in the exporting process before the cuMemFreeAsync operation completes in its stream as long as the cuMemFreeAsync in the exporting process specifies a stream with a stream dependency on the importing process's cuMemFreeAsync.
See also:
cuMemPoolExportToShareableHandle, cuMemPoolImportFromShareableHandle, cuMemPoolExportPointer
- CUresult cuMemPoolSetAccess ( CUmemoryPool pool, const CUmemAccessDesc* map, size_t count )
-
Controls visibility of pools between devices.
Parameters
- pool
- - The pool being modified
- map
- - Array of access descriptors. Each descriptor instructs the access to enable for a single gpu.
- count
- - Number of descriptors in the map array.
Description
See also:
cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate
- CUresult cuMemPoolSetAttribute ( CUmemoryPool pool, CUmemPool_attribute attr, void* value )
-
Sets attributes of a memory pool.
Parameters
- pool
- - The memory pool to modify
- attr
- - The attribute to modify
- value
- - Pointer to the value to assign
Returns
Description
Supported attributes are:
-
CU_MEMPOOL_ATTR_RELEASE_THRESHOLD: (value type = cuuint64_t) Amount of reserved memory in bytes to hold onto before trying to release memory back to the OS. When more than the release threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the OS on the next call to stream, event or context synchronize. (default 0)
-
CU_MEMPOOL_ATTR_REUSE_FOLLOW_EVENT_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to use memory asynchronously freed in another stream as long as a stream ordering dependency of the allocating stream on the free action exists. Cuda events and null stream interactions can create the required stream ordered dependencies. (default enabled)
-
CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC: (value type = int) Allow reuse of already completed frees when there is no dependency between the free and allocation. (default enabled)
-
CU_MEMPOOL_ATTR_REUSE_ALLOW_INTERNAL_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to insert new stream dependencies in order to establish the stream ordering required to reuse a piece of memory released by cuMemFreeAsync (default enabled).
-
CU_MEMPOOL_ATTR_RESERVED_MEM_HIGH: (value type = cuuint64_t) Reset the high watermark that tracks the amount of backing memory that was allocated for the memory pool. It is illegal to set this attribute to a non-zero value.
-
CU_MEMPOOL_ATTR_USED_MEM_HIGH: (value type = cuuint64_t) Reset the high watermark that tracks the amount of used memory that was allocated for the memory pool.
See also:
cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate
- CUresult cuMemPoolTrimTo ( CUmemoryPool pool, size_t minBytesToKeep )
-
Tries to release memory back to the OS.
Parameters
- pool
- - The memory pool to trim
- minBytesToKeep
- - If the pool has less than minBytesToKeep reserved, the TrimTo operation is a no-op. Otherwise the pool will be guaranteed to have at least minBytesToKeep bytes reserved after the operation.
Description
Releases memory back to the OS until the pool contains fewer than minBytesToKeep reserved bytes, or there is no more memory that the allocator can safely release. The allocator cannot release OS allocations that back outstanding asynchronous allocations. The OS allocations may happen at different granularity from the user allocations.
Note:-
: Allocations that have not been freed count as outstanding.
-
: Allocations that have been asynchronously freed but whose completion has not been observed on the host (eg. by a synchronize) can count as outstanding.
See also:
cuMemAllocAsync, cuMemFreeAsync, cuDeviceGetDefaultMemPool, cuDeviceGetMemPool, cuMemPoolCreate