NvTritonExt
NVIDIA Triton Inference components. This extension is intended to be used with Triton 2.37.0 (x86_64) and 2.40.0 (Jetpack 6.0).
Refer to the official NVIDIA Triton documentation for support matrix and more.
UUID: a3c95d1c-c06c-4a4e-a2f9-8d9078ab645c
Version: 0.4.0
Author: NVIDIA
License: Proprietary
Components
nvidia::triton::TritonServer
Triton inference server component using the Triton C API.
Component ID: 26228984-ffc4-4162-9af5-6e3008aa2982
Base Type: nvidia::gxf::Component
Parameters
log_level
Logging level for Triton.
Valid values:
0: Error
1: Warn
2: Info
3+: Verbose
Flags: GXF_PARAMETER_FLAGS_NONE (1 = default)
Type: GXF_PARAMETER_TYPE_UINT32
enable_strict_model_config
Enables strict model configuration to enforce presence of config. If disabled, TensorRT, TensorFlow saved-model, and ONNX models do not require a model configuration file. Triton can derive all the required settings automatically.
Flags: GXF_PARAMETER_FLAGS_NONE (true = default)
Type: GXF_PARAMETER_TYPE_BOOL
min_compute_capability
Minimum Compute Capability for GPU. Refer to https://developer.nvidia.com/cuda-gpus.
Flags: GXF_PARAMETER_FLAGS_NONE (6.0 = default)
Type: GXF_PARAMETER_TYPE_FLOAT64
model_repository_paths
List of Triton Model Repository Paths. Refer to https://github.com/bytedance/triton-inference-server/blob/master/docs/model_repository.md
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
tf_gpu_memory_fraction
The portion of GPU memory to be reserved for TensorFlow models.
Flags: GXF_PARAMETER_FLAGS_NONE (0.0 = default)
Type: GXF_PARAMETER_TYPE_FLOAT64
tf_disable_soft_placement_
Allow Tensorflow to use CPU operation when GPU implementation is not available.
Flags: GXF_PARAMETER_FLAGS_NONE (true = default)
Type: GXF_PARAMETER_TYPE_BOOL
backend_directory_path
Path to Triton backend directory.
Flags: GXF_PARAMETER_FLAGS_NONE (”” = default)
Type: GXF_PARAMETER_TYPE_STRING
model_control_mode
Triton model control mode.
Valid values:
“none”: Load all models in the model repository at startup.
“explicit”: Allow models to load when needed.
Flags: GXF_PARAMETER_FLAGS_NONE (“explicit” = default)
Type: GXF_PARAMETER_TYPE_STRING
backend_configs
Triton backend configurations in the format: backend,setting=value
.
Refer to Backend specific documentation: https://github.com/triton-inference-server/tensorflow_backend#command-line-options, https://github.com/triton-inference-server/python_backend#managing-shared-memory’.
Flags: GXF_PARAMETER_FLAGS_OPTIONAL
Type: GXF_PARAMETER_TYPE_STRING
nvidia::triton::TritonInferencerInterface
Helper component that provides an interface for Triton inferencing.
Component ID: 1661c015-6b1c-422d-a6f0-248cdc197b1a
Base Type: nvidia::gxf::Component
nvidia::triton::TritonInferencerImpl
Component that implements the TritonInferencerInterface
to obtain inferences from the
TritonServer
component or from an external Triton instance.
Component ID: b84cf267-b223-4df5-ac82-752d9fae1014
Base Type: nvidia::triton::TritonInferencerInterface
Parameters
server
Triton server. This optional handle must be specified if the inference_mode
of this
component is Direct
.
Flags: GXF_PARAMETER_FLAGS_OPTIONAL
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonServer
model_name
Triton model name to run inference.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
model_version
Triton model version of the model name to run inference.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_INT64
max_batch_size
Max batch size to run inference. This should match the value in the Triton model repository.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_UINT32
num_concurrent_requests
Maximum number of concurrent inference requests for this model version. This is used to define a pool of requests.
Flags: GXF_PARAMETER_FLAGS_NONE (1 = default)
Type: GXF_PARAMETER_TYPE_UINT32
async_scheduling_term
Asynchronous scheduling term that determines when a response is ready.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::gxf::AsynchronousSchedulingTerm
inference_mode
Triton inferencing mode.
Valid values:
Direct
: This mode requires aTritonServer
component handle to be passed to the optionalserver
parameter.
RemoteGrpc
: This mode requires the optionalserver_endpoint
point to an external Triton gRPC server URL.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
server_endpoint
Server endpoint URL for an external Triton instance. This optional string must be specified if the
inference_mode
of this component is of the Remote
variety.
Flags: GXF_PARAMETER_FLAGS_OPTIONAL
Type: GXF_PARAMETER_TYPE_STRING
nvidia::triton::TritonInferenceRequest
Generic codelet that requests a Triton Inference. This will use a handle to an InferencerImpl
to
interface with Triton.
Component ID: 34395920-232c-446f-b5b7-46f642ce84df
Base Type: nvidia::gxf::Codelet
Parameters
inferencer
Handle to Triton inference implementation. This is used to request an inference.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonInferencerInterface
rx
List of receivers to take input tensors.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::gxf::Receiver
input_tensor_names
Names of input tensors that exist in the ordered receivers in rx
.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
input_binding_names
Names of input bindings corresponding to Triton’s config inputs in the same order of what is
provided in input_tensor_names
.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
nvidia::triton::TritonInferenceResponse
Generic codelet that obtains a response from a Triton Inference. This will use a handle to an
InferencerImpl
to interface with Triton.
Component ID: 4dd957a7-aa55-4117-90d3-9a98e31ee176
Base Type: nvidia::gxf::Codelet
Parameters
inferencer
Handle to Triton inference implementation. This is used to request an inference.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonInferencerInterface
output_tensor_names
Names of output tensors in the order to be retrieved from the model.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
output_binding_names
Names of output bindings in the model in the same order of of what
is provided in output_tensor_names
.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_STRING
tx
Single transmitter to publish output tensors.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::gxf::Transmitter
nvidia::triton::TritonOptions
Generic struct that represent Triton Inference Options for model control and sequence control.
Component ID: 087696ed-229d-4199-876f-05b92d3887f0
nvidia::triton::TritonRequestReceptiveSchedulingTerm
Triton Scheduling Term that schedules Request Codelet when the inferencer can accept a new request.
Component ID: f8602412-1242-4e43-9dbf-9c559d496b84
Base Type: nvidia::gxf::SchedulingTerm
Parameters
inferencer
Handle to Triton inference implementation. This is used to check the accecptability of a new request.
Flags: GXF_PARAMETER_FLAGS_NONE
Type: GXF_PARAMETER_TYPE_HANDLE
Handle Type: nvidia::triton::TritonInferencerInterface