============================
UCX Extension
============================

Description
============

The UCX extension leverages the Unified Communication X (UCX) library to disaggregate a graph
in the GXF framework. This extension facilitates graph distribution across multiple hosts, enabling
the utilization of distributed GPU resources. UCX, an open-source library, is known for its
capability to speed up data across high-performance networks. It can tap into GPUDirect RDMA
technology to optimize network latencies and maximize distributed GPU traffic throughput. As a
result, users of this extension can harness the combined processing power of multiple GPUs
across diverse hosts. This can lead to substantial improvements in the speed and efficiency of
workflows. For more UCX details, visit https://openucx.org.

**For Example**

The subsequent diagram illustrates a disaggregated graph, composed of two tensor generators and a
tensor comparator. This tensor comparator assesses the outputs produced by these tensor generators.
The UcxExtension offers the capability to execute each entity on a distinct host.

 .. image:: /content/Ucx_extension_example.png
       :align: center
       :alt: Graph Example UCX Extension

For this, every graph that uses the UCX extension needs a UcxContext component. This component hosts the
UCP context and takes care of all connections, manages the data, and ensures that all operations close
properly at deinitialization.
When you're setting up your graph, replace your entity's standard transmitter and receiver with the
UcxTransmitter and UcxReceiver components. Be sure to configure all the parameters, including the IP,
port, and others, to establish the connection properly.

Currently, UCX supports sending messages of same type of memory (host or device).
This is the limitation of UCX not of the extension.

* UUID: 525f8a1a-dfb5-426b-8ddb-00c3ac839994
* Version: 0.0.5
* Author: NVIDIA
* License: LICENSE

Requirements
============

* NVIDIA ConnectX6-DX NIC or later.

  For more information on installing and configuring NICs, see:
  https://docs.nvidia.com/networking/display/ConnectX6VPI/Introduction

* Mellanox Open Fabrics Enterprise Distribution (MLNX_OFED) - version 5.5 or later, see
  https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

* For installation instructions, see
  https://docs.nvidia.com/networking/display/MLNXOFEDv551032/Installing+MLNX_OFED

* If installing the Mellanox OFED within a container:

  * Make sure to install the kernel drivers in the host OS by passing the ``--all``
    flag to mlnxofedinstall script.
  * In the container you can only install the user space libraries using the ``--user-space-only``
    flag to the mlnxofedinstall script.

* UCX - version 1.13 or later - needs to be compiled with CUDA support or use CUDA-enabled UCX packages
  from the git repository directly, see https://github.com/openucx/ucx/releases

* For installation instructions, follow the Release build instructions from here:
  https://github.com/openucx/ucx#release-builds.

  Note that UCX library should be compiled with CUDA as follows::
  ..  code-block:: bash

   $ ./contrib/configure-release --prefix=/install/path --enable-examples --with-java=no --with-cuda=/path/to/cuda --enable-mt



Components
==========

UcxContext
^^^^^^^^^^
UcxContext is essential within the GXF UCX extension. It's responsible for initializing
the UCX context, running listeners, and managing connection requests and data receipts
for UcxReceivers.
UcxContext also sets up UcxTransmitter connections and resources. All connections -
for both UcxReceivers and UcxTransmitter - are managed within UcxContext.
Upon completion of the graph, UcxContext takes the lead in closing all connections
and releasing all resources.


* Component ID: 755d20a5-d794-467d-a86c-290eb2c32052
* Base Type: nvidia::gxf::NetworkContext
* Defined in: extensions/ucx/ucx_context.hpp

Parameters
++++++++++++

**serializer**

The entity serializer used by the component.
Should use UcxComponentSerializer type.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::EntitySerializer

|

**reconnect**

Try to reconnect if a connection is closed during run.
For UcxReceiver it would wait for a new connect request to establish new connection.
For UcxTransmitter it would send new connect request to the server to establish new connection.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_BOOL
* Default: true

|

**Optional GPU device resource**

Optional resource for GPU device.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::GPUDevice

UcxTransmitter
^^^^^^^^^^^^^^^^^

Transmitter component for the GXF UCX extension.
This component is used as a transmitter of an entity.
At the initilization stage it would send connect request for connection establishment.
When the Network Router executes the SyncOutbox function, it invokes the sync_io method of the UcxTransmitter.
This method, in turn, transmits the message leveraging the UCX Active Message Rendezvous protocol.

* Component ID: 58165d03-78b7-4696-b200-71621f90aee7
* Base Type: nvidia::gxf::Transmitter
* Defined in: extensions/ucx/ucx_transmitter.hpp

Parameters
++++++++++++

**capacity**

Queue's capacity of the transmitter.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_UINT64

|

**policy**

Queue's policy for handling data. Valid values:

0: pop
1: reject
2: fault

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_UINT64

|

**receiver_address**

Receiver address to connect to.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_STRING

|

**port**

Port of the receiver.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_INT32

|

**buffer**

Serialization Buffer to hold serialized data.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: Handle<UcxSerializationBuffer>

|

**maximum_connection_retries**

Maximum retries for connection establishment.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_INT32

|

**gpu_device**

Optional GPU device resource.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: Handle<GPUDevice>

UcxReceiver
^^^^^^^^^^^^^^^

Receives data in the GXF UCX extension.
This component replace a receiver of an entity.
When an entity sends a message to this receiver, the UCXContext receives the message header, prompting
the router to execute the SyncInbox function. The SyncInbox function subsequently triggers the sync_io
method of the UcxReceiver. This method utilizes the UCX Active Message Rendezvous protocol to receive
the data content of the message.

* Component ID: e961132b-45d5-48b8-ac5d-2bb1a4a42279
* Base Type: nvidia::gxf::Receiver
* Defined in: extensions/ucx/ucx_receiver.hpp

Parameters
++++++++++

**capacity**

Queue's capacity of the receiver.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_UINT64
* Default: 10

|

**policy**

Queue's policy for handling data. 0: pop, 1: reject, 2: fault

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_UINT64
* Default: 2

|

**address**

Listener address to receive data.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_STRING
* Default: "0.0.0.0"

|

**port**

Listener's port for receiving data.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_INT32
* Default: 13337

|

**buffer**

Serialization Buffer to hold serialized data.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: UcxSerializationBuffer

|

**Optional GPU device resource**

Optional resource for GPU device.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::GPUDevice

UcxComponentSerializer
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Serializer for the components in the GXF UCX extension.
Currently supports serializaing Timestamps, Tensors, Video Buffer,
Audio Buffer and integer components.
Valid for sharing data between devices with the same endianness.

* Component ID: 64994305-4260-4f5c-ac5f-69da6dd6cfa5
* Base Type: nvidia::gxf::ComponentSerializer
* Defined in: extensions/ucx/ucx_component_serializer.hpp

Parameters
++++++++++

**allocator**

Memory allocator for tensor components.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::Allocator


UcxEntitySerializer
^^^^^^^^^^^^^^^^^^^^^^^^^^

Serializer for the entities in the GXF UCX extension.

* Component ID: 14997aa4-4a01-4cd4-86ab-687f85a13f10
* Base Type: nvidia::gxf::EntitySerializer
* Defined in: extensions/ucx/ucx_entity_serializer.hpp

Parameters
++++++++++

**component_serializers**

List of serializers for serializing and deserializing components.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: FixedVector<nvidia::gxf::Handle<nvidia::gxf::ComponentSerializer>, kMaxTempComponents>

|

**verbose_warning**

Whether or not to print verbose warning.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_BOOL
* Default: true


UcxSerializationBuffer
^^^^^^^^^^^^^^^^^^^^^^^^^^

Serialization buffer for the GXF UCX extension.

Component ID: 1d9fcaf7-1db1-4992-93ec-714979f7d78d
Base Type: nvidia::gxf::Endpoint
Defined in: extensions/ucx/ucx_serialization_buffer.hpp

Parameters
++++++++++

**allocator**

Memory allocator for tensor components.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::Handle<nvidia::gxf::Allocator>

|

**buffer_size**

Size of the buffer in bytes (4kB by default).

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_SIZE
* Default: 4096 (4kB)



Example
========

This section provides an example of utilizing the UCX extension within a simple graph.
This graph comprises two subgraphs, interconnected through the UCX extension.
The configuration details for both the server and client side are encapsulated in their
respective YAML files, which are shared below for your reference.

**Server side - test_ping_rx.yaml file:**

.. code-block:: yaml

    name: rx
    components:
    - name: allocator
      type: nvidia::gxf::test::MockAllocator
    - name: serialization_buffer
      type: nvidia::gxf::UcxSerializationBuffer
      parameters:
        allocator: allocator
    - name: signal
      type: nvidia::gxf::UcxReceiver
      parameters:
        address: 5.5.5.5
        port: 13337
        buffer: serialization_buffer
    - type: nvidia::gxf::MessageAvailableSchedulingTerm
      parameters:
        receiver: signal
        min_size: 1
    - type: nvidia::gxf::PingRx
      parameters:
        signal: signal
    - type: nvidia::gxf::test::StepCount
      parameters:
        expected_count: 10
    - type: nvidia::gxf::CountSchedulingTerm
      parameters:
        count: 10
    ---
    name: ucx
    components:
    - name: allocator
      type: nvidia::gxf::test::MockAllocator
    - name: component_serializer
      type: nvidia::gxf::UcxComponentSerializer
      parameters:
        allocator: allocator
    - name: entity_serializer
      type: nvidia::gxf::UcxEntitySerializer
      parameters:
        component_serializers: [ component_serializer ]
    - name: ucx_context
      type: nvidia::gxf::UcxContext
      parameters:
        serializer: entity_serializer
    ---
    name: scheduler
    components:
    - name: clock
      type: nvidia::gxf::RealtimeClock
    - type: nvidia::gxf::GreedyScheduler
      parameters:
        max_duration_ms: 1000000
        clock: clock
        stop_on_deadlock: False
    ---
    name: gpu_resource_entity_0
    components:
    - type: nvidia::gxf::GPUDevice
      name: gpu_resource_0
      parameters:
        dev_id: 0
    ---
    EntityGroups:
    - name: entity_group_0
      target:
      - "rx"
      - "ucx"
      - "gpu_resource_entity_0"


**Client side - test_ping_tx.yaml file:**

.. code-block:: yaml

  name: tx
  components:
  - name: allocator
    type: nvidia::gxf::test::MockAllocator
  - name: serialization_buffer
    type: nvidia::gxf::UcxSerializationBuffer
    parameters:
      allocator: allocator
  - name: signal
    type: nvidia::gxf::UcxTransmitter
    parameters:
      receiver_address: 5.5.5.5
      port: 13337
      buffer: serialization_buffer
  - type: nvidia::gxf::PingTx
    parameters:
      signal: signal
  - type: nvidia::gxf::CountSchedulingTerm
    parameters:
      count: 10
  - type: nvidia::gxf::test::StepCount
    parameters:
      expected_count: 10
  ---
  name: ucx
  components:
  - name: allocator
    type: nvidia::gxf::test::MockAllocator
  - name: component_serializer
    type: nvidia::gxf::UcxComponentSerializer
    parameters:
      allocator: allocator
  - name: entity_serializer
    type: nvidia::gxf::UcxEntitySerializer
    parameters:
      component_serializers: [ component_serializer ]
  - name: ucx_context
    type: nvidia::gxf::UcxContext
    parameters:
      serializer: entity_serializer
  ---
  name: scheduler
  components:
  - name: clock
    type: nvidia::gxf::RealtimeClock
  - type: nvidia::gxf::GreedyScheduler
    parameters:
      stop_on_deadlock: false
      max_duration_ms: 1000000
      clock: clock
  ---
  name: gpu_resource_entity_0
  components:
  - type: nvidia::gxf::GPUDevice
    name: gpu_resource_0
    parameters:
      dev_id: 0
  ---
  EntityGroups:
  - name: entity_group_0
    target:
    - "tx"
    - "ucx"
    - "gpu_resource_entity_0"