NvTritonExt#

NVIDIA Triton Inference components. This extension is intended to be used with Triton 2.54.0 (x86_64) and 2.60.0 (Jetpack 7.0).

Refer to the official NVIDIA Triton documentation for support matrix and more.

UUID: a3c95d1c-c06c-4a4e-a2f9-8d9078ab645c
Version: 0.5.0
Author: NVIDIA
License: Proprietary

Components#

nvidia::triton::TritonServer#

Triton inference server component using the Triton C API.

Component ID: 26228984-ffc4-4162-9af5-6e3008aa2982
Base Type: nvidia::gxf::Component

Parameters#

log_level

Logging level for Triton.

Valid values:

0: Error

1: Warn

2: Info

3+: Verbose

Flags: GXF_PARAMETER_FLAGS_NONE (1 = default)
Type: GXF_PARAMETER_TYPE_UINT32

enable_strict_model_config

Enables strict model configuration to enforce presence of config. If disabled, TensorRT, TensorFlow saved-model, and ONNX models do not require a model configuration file. Triton can derive all the required settings automatically.

Flags: GXF_PARAMETER_FLAGS_NONE (true = default)
Type: GXF_PARAMETER_TYPE_BOOL

min_compute_capability

Minimum Compute Capability for GPU. Refer to https://developer.nvidia.com/cuda-gpus.

Flags: GXF_PARAMETER_FLAGS_NONE (6.0 = default)
Type: GXF_PARAMETER_TYPE_FLOAT64

model_repository_paths

List of Triton Model Repository Paths. Refer to bytedance/triton-inference-server

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

tf_gpu_memory_fraction

The portion of GPU memory to be reserved for TensorFlow models.

Flags: GXF_PARAMETER_FLAGS_NONE (0.0 = default)
Type: GXF_PARAMETER_TYPE_FLOAT64

tf_disable_soft_placement_

Allow Tensorflow to use CPU operation when GPU implementation is not available.

Flags: GXF_PARAMETER_FLAGS_NONE (true = default)
Type: GXF_PARAMETER_TYPE_BOOL

backend_directory_path

Path to Triton backend directory.

Flags: GXF_PARAMETER_FLAGS_NONE (”” = default)
Type: GXF_PARAMETER_TYPE_STRING

model_control_mode

Triton model control mode.

Valid values:

“none”: Load all models in the model repository at startup.

“explicit”: Allow models to load when needed.

Flags: GXF_PARAMETER_FLAGS_NONE (“explicit” = default)
Type: GXF_PARAMETER_TYPE_STRING

backend_configs

Triton backend configurations in the format: backend,setting=value. Refer to Backend specific documentation: triton-inference-server/tensorflow_backend, triton-inference-server/python_backend’.

Flags: GXF_PARAMETER_FLAGS_OPTIONAL
Type: GXF_PARAMETER_TYPE_STRING

nvidia::triton::TritonInferencerInterface#

Helper component that provides an interface for Triton inferencing.

Component ID: 1661c015-6b1c-422d-a6f0-248cdc197b1a
Base Type: nvidia::gxf::Component

nvidia::triton::TritonInferencerImpl#

Component that implements the TritonInferencerInterface to obtain inferences from the TritonServer component or from an external Triton instance.

Component ID: b84cf267-b223-4df5-ac82-752d9fae1014
Base Type: nvidia::triton::TritonInferencerInterface

Parameters#

server

Triton server. This optional handle must be specified if the inference_mode of this component is Direct.

Flags: GXF_PARAMETER_FLAGS_OPTIONAL
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonServer

model_name

Triton model name to run inference.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

model_version

Triton model version of the model name to run inference.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_INT64

max_batch_size

Max batch size to run inference. This should match the value in the Triton model repository.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_UINT32

num_concurrent_requests

Maximum number of concurrent inference requests for this model version. This is used to define a pool of requests.

Flags: GXF_PARAMETER_FLAGS_NONE (1 = default)
Type: GXF_PARAMETER_TYPE_UINT32

async_scheduling_term

Asynchronous scheduling term that determines when a response is ready.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::gxf::AsynchronousSchedulingTerm

inference_mode

Triton inferencing mode.

Valid values:

Direct: This mode requires a TritonServer component handle to be passed to the optional server parameter.

RemoteGrpc: This mode requires the optional server_endpoint point to an external Triton gRPC server URL.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

server_endpoint

Server endpoint URL for an external Triton instance. This optional string must be specified if the inference_mode of this component is of the Remote variety.

Flags: GXF_PARAMETER_FLAGS_OPTIONAL
Type: GXF_PARAMETER_TYPE_STRING

nvidia::triton::TritonInferenceRequest#

Generic codelet that requests a Triton Inference. This will use a handle to an InferencerImpl to interface with Triton.

Component ID: 34395920-232c-446f-b5b7-46f642ce84df
Base Type: nvidia::gxf::Codelet

Parameters#

inferencer

Handle to Triton inference implementation. This is used to request an inference.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonInferencerInterface

rx

List of receivers to take input tensors.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::gxf::Receiver

input_tensor_names

Names of input tensors that exist in the ordered receivers in rx.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

input_binding_names

Names of input bindings corresponding to Triton’s config inputs in the same order of what is provided in input_tensor_names.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

nvidia::triton::TritonInferenceResponse#

Generic codelet that obtains a response from a Triton Inference. This will use a handle to an InferencerImpl to interface with Triton.

Component ID: 4dd957a7-aa55-4117-90d3-9a98e31ee176
Base Type: nvidia::gxf::Codelet

Parameters#

inferencer

Handle to Triton inference implementation. This is used to request an inference.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonInferencerInterface

output_tensor_names

Names of output tensors in the order to be retrieved from the model.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

output_binding_names

Names of output bindings in the model in the same order of of what is provided in output_tensor_names.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING

tx

Single transmitter to publish output tensors.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::gxf::Transmitter

nvidia::triton::TritonOptions#

Generic struct that represent Triton Inference Options for model control and sequence control.

Component ID: 087696ed-229d-4199-876f-05b92d3887f0

nvidia::triton::TritonRequestReceptiveSchedulingTerm#

Triton Scheduling Term that schedules Request Codelet when the inferencer can accept a new request.

Component ID: f8602412-1242-4e43-9dbf-9c559d496b84
Base Type: nvidia::gxf::SchedulingTerm

Parameters#

inferencer

Handle to Triton inference implementation. This is used to check the accecptability of a new request.

Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonInferencerInterface