UCX Extension#

Description#

The UCX extension leverages the Unified Communication X (UCX) library to disaggregate a graph in the GXF framework. This extension facilitates graph distribution across multiple hosts, enabling the utilization of distributed GPU resources. UCX, an open-source library, is known for its capability to speed up data across high-performance networks. It can tap into GPUDirect RDMA technology to optimize network latencies and maximize distributed GPU traffic throughput. As a result, users of this extension can harness the combined processing power of multiple GPUs across diverse hosts. This can lead to substantial improvements in the speed and efficiency of workflows. For more UCX details, visit https://openucx.org.

For Example

The subsequent diagram illustrates a disaggregated graph, composed of two tensor generators and a tensor comparator. This tensor comparator assesses the outputs produced by these tensor generators. The UcxExtension offers the capability to execute each entity on a distinct host.

Graph Example UCX Extension

For this, every graph that uses the UCX extension needs a UcxContext component. This component hosts the UCP context and takes care of all connections, manages the data, and ensures that all operations close properly at deinitialization. When you’re setting up your graph, replace your entity’s standard transmitter and receiver with the UcxTransmitter and UcxReceiver components. Be sure to configure all the parameters, including the IP, port, and others, to establish the connection properly.

Currently, UCX supports sending messages of same type of memory (host or device). This is the limitation of UCX not of the extension.

  • UUID: 525f8a1a-dfb5-426b-8ddb-00c3ac839994

  • Version: 0.8.0

  • Author: NVIDIA

  • License: LICENSE

Requirements#

Components#

UcxContext#

UcxContext is essential within the GXF UCX extension. It’s responsible for initializing the UCX context, running listeners, and managing connection requests and data receipts for UcxReceivers. UcxContext also sets up UcxTransmitter connections and resources. All connections - for both UcxReceivers and UcxTransmitter - are managed within UcxContext. Upon completion of the graph, UcxContext takes the lead in closing all connections and releasing all resources.

  • Component ID: 755d20a5-d794-467d-a86c-290eb2c32052

  • Base Type: nvidia::gxf::NetworkContext

  • Defined in: extensions/ucx/ucx_context.hpp

Parameters#

serializer

The entity serializer used by the component. Should use UcxComponentSerializer type.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::EntitySerializer


reconnect

Try to reconnect if a connection is closed during run. For UcxReceiver it would wait for a new connect request to establish new connection. For UcxTransmitter it would send new connect request to the server to establish new connection.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_BOOL

  • Default: true


Optional GPU device resource

Optional resource for GPU device.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::GPUDevice

UcxTransmitter#

Transmitter component for the GXF UCX extension. This component is used as a transmitter of an entity. At the initilization stage it would send connect request for connection establishment. When the Network Router executes the SyncOutbox function, it invokes the sync_io method of the UcxTransmitter. This method, in turn, transmits the message leveraging the UCX Active Message Rendezvous protocol.

  • Component ID: 58165d03-78b7-4696-b200-71621f90aee7

  • Base Type: nvidia::gxf::Transmitter

  • Defined in: extensions/ucx/ucx_transmitter.hpp

Parameters#

capacity

Queue’s capacity of the transmitter.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_UINT64


policy

Queue’s policy for handling data. Valid values:

0: pop 1: reject 2: fault

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_UINT64


receiver_address

Receiver address to connect to.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_STRING


port

Port of the receiver.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32


buffer

Serialization Buffer to hold serialized data.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: Handle<UcxSerializationBuffer>


maximum_connection_retries

Maximum retries for connection establishment.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32


gpu_device

Optional GPU device resource.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: Handle<GPUDevice>

UcxReceiver#

Receives data in the GXF UCX extension. This component replace a receiver of an entity. When an entity sends a message to this receiver, the UCXContext receives the message header, prompting the router to execute the SyncInbox function. The SyncInbox function subsequently triggers the sync_io method of the UcxReceiver. This method utilizes the UCX Active Message Rendezvous protocol to receive the data content of the message.

  • Component ID: e961132b-45d5-48b8-ac5d-2bb1a4a42279

  • Base Type: nvidia::gxf::Receiver

  • Defined in: extensions/ucx/ucx_receiver.hpp

Parameters#

capacity

Queue’s capacity of the receiver.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_UINT64

  • Default: 10


policy

Queue’s policy for handling data. 0: pop, 1: reject, 2: fault

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_UINT64

  • Default: 2


address

Listener address to receive data.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_STRING

  • Default: “0.0.0.0”


port

Listener’s port for receiving data.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_INT32

  • Default: 13337


buffer

Serialization Buffer to hold serialized data.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: UcxSerializationBuffer


Optional GPU device resource

Optional resource for GPU device.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::GPUDevice

UcxComponentSerializer#

Serializer for the components in the GXF UCX extension. Currently supports serializaing Timestamps, Tensors, Video Buffer, Audio Buffer and integer components. Valid for sharing data between devices with the same endianness.

  • Component ID: 64994305-4260-4f5c-ac5f-69da6dd6cfa5

  • Base Type: nvidia::gxf::ComponentSerializer

  • Defined in: extensions/ucx/ucx_component_serializer.hpp

Parameters#

allocator

Memory allocator for tensor components.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::Allocator

UcxEntitySerializer#

Serializer for the entities in the GXF UCX extension.

  • Component ID: 14997aa4-4a01-4cd4-86ab-687f85a13f10

  • Base Type: nvidia::gxf::EntitySerializer

  • Defined in: extensions/ucx/ucx_entity_serializer.hpp

Parameters#

component_serializers

List of serializers for serializing and deserializing components.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: FixedVector<nvidia::gxf::Handle<nvidia::gxf::ComponentSerializer>, kMaxTempComponents>


verbose_warning

Whether or not to print verbose warning.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_BOOL

  • Default: true

UcxSerializationBuffer#

Serialization buffer for the GXF UCX extension.

Component ID: 1d9fcaf7-1db1-4992-93ec-714979f7d78d Base Type: nvidia::gxf::Endpoint Defined in: extensions/ucx/ucx_serialization_buffer.hpp

Parameters#

allocator

Memory allocator for tensor components.

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_HANDLE

  • Handle Type: nvidia::gxf::Handle<nvidia::gxf::Allocator>


buffer_size

Size of the buffer in bytes (4kB by default).

  • Flags: GXF_PARAMETER_FLAGS_NONE

  • Type: GXF_PARAMETER_TYPE_SIZE

  • Default: 4096 (4kB)

Example#

This section provides an example of utilizing the UCX extension within a simple graph. This graph comprises two subgraphs, interconnected through the UCX extension. The configuration details for both the server and client side are encapsulated in their respective YAML files, which are shared below for your reference.

Server side - test_ping_rx.yaml file:

name: rx
components:
- name: allocator
  type: nvidia::gxf::test::MockAllocator
- name: serialization_buffer
  type: nvidia::gxf::UcxSerializationBuffer
  parameters:
    allocator: allocator
- name: signal
  type: nvidia::gxf::UcxReceiver
  parameters:
    address: 5.5.5.5
    port: 13337
    buffer: serialization_buffer
- type: nvidia::gxf::MessageAvailableSchedulingTerm
  parameters:
    receiver: signal
    min_size: 1
- type: nvidia::gxf::PingRx
  parameters:
    signal: signal
- type: nvidia::gxf::test::StepCount
  parameters:
    expected_count: 10
- type: nvidia::gxf::CountSchedulingTerm
  parameters:
    count: 10
---
name: ucx
components:
- name: allocator
  type: nvidia::gxf::test::MockAllocator
- name: component_serializer
  type: nvidia::gxf::UcxComponentSerializer
  parameters:
    allocator: allocator
- name: entity_serializer
  type: nvidia::gxf::UcxEntitySerializer
  parameters:
    component_serializers: [ component_serializer ]
- name: ucx_context
  type: nvidia::gxf::UcxContext
  parameters:
    serializer: entity_serializer
---
name: scheduler
components:
- name: clock
  type: nvidia::gxf::RealtimeClock
- type: nvidia::gxf::GreedyScheduler
  parameters:
    max_duration_ms: 1000000
    clock: clock
    stop_on_deadlock: False
---
name: gpu_resource_entity_0
components:
- type: nvidia::gxf::GPUDevice
  name: gpu_resource_0
  parameters:
    dev_id: 0
---
EntityGroups:
- name: entity_group_0
  target:
  - "rx"
  - "ucx"
  - "gpu_resource_entity_0"

Client side - test_ping_tx.yaml file:

name: tx
components:
- name: allocator
  type: nvidia::gxf::test::MockAllocator
- name: serialization_buffer
  type: nvidia::gxf::UcxSerializationBuffer
  parameters:
    allocator: allocator
- name: signal
  type: nvidia::gxf::UcxTransmitter
  parameters:
    receiver_address: 5.5.5.5
    port: 13337
    buffer: serialization_buffer
- type: nvidia::gxf::PingTx
  parameters:
    signal: signal
- type: nvidia::gxf::CountSchedulingTerm
  parameters:
    count: 10
- type: nvidia::gxf::test::StepCount
  parameters:
    expected_count: 10
---
name: ucx
components:
- name: allocator
  type: nvidia::gxf::test::MockAllocator
- name: component_serializer
  type: nvidia::gxf::UcxComponentSerializer
  parameters:
    allocator: allocator
- name: entity_serializer
  type: nvidia::gxf::UcxEntitySerializer
  parameters:
    component_serializers: [ component_serializer ]
- name: ucx_context
  type: nvidia::gxf::UcxContext
  parameters:
    serializer: entity_serializer
---
name: scheduler
components:
- name: clock
  type: nvidia::gxf::RealtimeClock
- type: nvidia::gxf::GreedyScheduler
  parameters:
    stop_on_deadlock: false
    max_duration_ms: 1000000
    clock: clock
---
name: gpu_resource_entity_0
components:
- type: nvidia::gxf::GPUDevice
  name: gpu_resource_0
  parameters:
    dev_id: 0
---
EntityGroups:
- name: entity_group_0
  target:
  - "tx"
  - "ucx"
  - "gpu_resource_entity_0"