Data Pointers
In general NCCL will accept any CUDA pointers that are accessible from the CUDA device associated to the communicator object. This includes:
device memory local to the CUDA device
host memory registered using CUDA SDK APIs cudaHostRegister or cudaGetDevicePointer
managed and unified memory
The only exception is device memory located on another device but accessible from the current device using peer access. NCCL will return an error in that case to avoid programming errors (only when NCCL_CHECK_POINTERS=1 since 2.2.12).