Types
The following types are used by the NCCL library.
ncclComm_t
-
type ncclComm_t
NCCL communicator. Points to an opaque structure inside NCCL.
ncclResult_t
-
type ncclResult_t
Return values for all NCCL functions. Possible values are :
-
ncclSuccess
(
0) Function succeeded.
-
ncclUnhandledCudaError
(
1) A call to a CUDA function failed.
-
ncclSystemError
(
2) A call to the system failed.
-
ncclInternalError
(
3) An internal check failed. This is due to either a bug in NCCL or a memory corruption.
-
ncclInvalidArgument
(
4) An argument has an invalid value.
-
ncclInvalidUsage
(
5) The call to NCCL is incorrect. This is usually reflecting a programming error.
-
ncclRemoteError
(
6) A call failed possibly due to a network error or a remote process exiting prematurely.
-
ncclInProgress
(
7) A NCCL operation on the communicator is being enqueued and is being progressed in the background.
Whenever a function returns an error (neither ncclSuccess nor ncclInProgress), NCCL should print a more detailed message when the environment variable NCCL_DEBUG is set to “WARN”.
-
ncclSuccess
ncclDataType_t
-
type ncclDataType_t
NCCL defines the following integral and floating data-types.
-
ncclInt8
Signed 8-bits integer
-
ncclChar
Signed 8-bits integer
-
ncclUint8
Unsigned 8-bits integer
-
ncclInt32
Signed 32-bits integer
-
ncclInt
Signed 32-bits integer
-
ncclUint32
Unsigned 32-bits integer
-
ncclInt64
Signed 64-bits integer
-
ncclUint64
Unsigned 64-bits integer
-
ncclFloat16
16-bits floating point number (half precision)
-
ncclHalf
16-bits floating point number (half precision)
-
ncclFloat32
32-bits floating point number (single precision)
-
ncclFloat
32-bits floating point number (single precision)
-
ncclFloat64
64-bits floating point number (double precision)
-
ncclDouble
64-bits floating point number (double precision)
-
ncclBfloat16
16-bits floating point number (truncated precision in bfloat16 format, CUDA 11 or later)
-
ncclFloat8e4m3
8-bits floating point number, 4 exponent bits, 3 mantissa bits (CUDA >= 11.8 and SM >= 90)
-
ncclFloat8e5m2
8-bits floating point number, 5 exponent bits, 2 mantissa bits (CUDA >= 11.8 and SM >= 90)
-
ncclInt8
ncclRedOp_t
ncclScalarResidence_t
-
type ncclScalarResidence_t
Indicates where (memory space) scalar arguments reside and when they can be dereferenced.
-
ncclScalarHostImmediate
The scalar resides in host memory and should be derefenced in the most immediate way.
-
ncclScalarDevice
The scalar resides on device visible memory and should be dereferenced once needed.
-
ncclScalarHostImmediate
ncclConfig_t
-
type ncclConfig_t
A structure-based configuration users can set to initialize a communicator; a newly created configuration must be initialized by NCCL_CONFIG_INITIALIZER.
-
NCCL_CONFIG_INITIALIZER
A configuration macro initializer which must be assigned to a newly created configuration.
-
blocking
This attribute can be set as integer 0 or 1 to indicate nonblocking or blocking communicator behavior correspondingly. Blocking is the default behavior.
-
cgaClusterSize
Set Cooperative Group Array (CGA) size of kernels launched by NCCL. This attribute can be set between 0 and 8, and the default value is 4 since sm90 architecture and 0 for older architectures.
-
minCTAs
Set the minimal number of CTAs NCCL should use for each kernel. Set to a positive integer value, up to 32. The default value is 1.
-
maxCTAs
Set the maximal number of CTAs NCCL should use for each kernel. Set to a positive integer value, up to 32. The default value is 32.
-
netName
Specify the network module name NCCL should use for network communication. The value of netName must match exactly the name of the network module (case-insensitive). NCCL internal network module names are “IB” (generic IB verbs) and “Socket” (TCP/IP sockets). External network plugins define their own names. The default value is undefined, and NCCL will choose the network module automatically.
Specify whether to share resources with child communicator during communicator split. Set the value of splitShare to 0 or 1. The default value is 0. When the parent communicator is created with splitShare=1 during ncclCommInitRankConfig, the child communicator can share internal resources of the parent during communicator split. Split communicators are in the same family. When resources are shared, aborting any communicator can result in other communicators in the same family becoming unusable. Irrespective of whether sharing resources or not, users should always abort/destroy all no longer needed communicators to free up resources. Note: when the parent communicator has been revoked, resource sharing during split is disabled regardless of this flag.
Specify whether to share resources with child communicator during communicator shrink. Set the value of shrinkShare to 0 or 1. The default value is 0. Note: when shrink is used with NCCL_SHRINK_ABORT, the value of shrinkShare is ignored and no resources are shared. When the parent communicator has been revoked, resource sharing is also disabled. The behavior of this flag is similar to splitShare, see above.
-
trafficClass
Set the traffic class (TC) to use for network operations on the communicator. The meaning of TC is specific to the network plugin in use by the communicator (e.g. IB networks use service level, RoCE networks use type of service). Assigning different TCs to each communicator can benefit workloads which overlap communication. TCs are defined by the system configuration and should be greater than or equal to 0. Note that environment variables, such as NCCL_IB_SL and NCCL_IB_TC, take precedence over user-specified TC values. To utilize user-defined TCs, ensure that these environment variables are unset.
-
collnetEnable
Set 1/0 to enable/disable IB SHARP on the communicator. The default value is 0 (disabled).
-
CTAPolicy
Set the policy for the communicator. The full list of supported policies can be found in NCCL Communicator CTA Policy Flags. The default value is NCCL_CTA_POLICY_DEFAULT.
-
nvlsCTAs
Set the total number of CTAs NCCL should use for NVLS kernels. Set to a positive integer value. By default, NCCL will automatically determine the best number of CTAs based on the system configuration.
-
commName
Specify the user defined name for the communicator. The communicator name can be used by NCCL to enrich logging and profiling.
-
NCCL_CONFIG_INITIALIZER
-
nChannelsPerNetPeer
Set the number of network channels to be used for pairwise communication. The value must be a positive integer and will be round up to the next power of 2. The default value is optimized for the AlltoAll communication pattern. Consider increasing the value to increase the bandwidth for send/recv communication.
-
graphUsageMode
Set the graph usage mode for the communicator. It support three possible values: 0 (no graphs), 1 (one graph) and 2 (either multiple graphs or mix of graph and non-graph). The default value is 2.
-
graphUsageMode
ncclSimInfo_t
-
type ncclSimInfo_t
This struct will be used by ncclGroupSimulateEnd() to return information about the calls.
-
NCCL_SIM_INFO_INITIALIZER
NCCL_SIM_INFO_INITIALIZER is a configuration macro initializer which must be assigned to a newly created ncclSimInfo_t struct.
-
estimatedTime
Estimated time for the operation(s) in the group call will be returned in this attribute.
-
NCCL_SIM_INFO_INITIALIZER
ncclWindow_t
-
type ncclWindow_t
NCCL window object for window registration and deregistration.