Inference Builder Overview#

Inference Builder is an open-source tool that automates the generation of end-to-end inference pipelines across multiple AI frameworks. It supports seamless integration of custom logic and can package the entire pipeline as a deployable container image.

To generate an inference pipeline, you only need to provide a single YAML configuration file to define the pipeline, including model selection, input/output formats, backend inference engines, and any custom preprocessing or postprocessing code. The output is either a standalone Python application, or a microservice when a server type is appropriately specified and an OpenAPI spec is provided.

Note

This document offers a quick overview for Inference Builder. For detailed information about the usage of Inference Builder, how it works, the features it supports, etc., please refer to the Inference Builder README page.

Inference Builder Setup#

The recommended OS for running the Inference Builder is Ubuntu Desktop 24.04 with python 3.12.

Clone the Inference Builder repo.

$ git clone https://github.com/NVIDIA-AI-IOT/inference_builder
$ cd inference_builder
$ git submodule update --init --recursive

Install dependencies.

$ sudo apt update
$ sudo apt install -y protobuf-compiler python3.12-dev python3.12-venv
$ python -m venv .venv
$ source .venv/bin/activate
$ pip3 install -r requirements.txt

Getting Started with the Examples#

Under builder/samples/ directory, you can find all the examples to start with. Please refer to the README.md file in each individual directory for detailed instructions.

Before starting with the examples, you need to have a proper docker environment. Below packages are required to build and run the container images:

Docker
Docker Compose
NVIDIA Container Toolkit

Ensure nvidia runtime added to /etc/docker/daemon.json to run GPU-enabled containers

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

The docker group must exist in your system, please check if it has been created using “getent group docker”. Your current user must belong to the docker group, If not, run the command below, then log out and back in for the group change to take effect.

sudo usermod -aG docker $USER

Some of the models used in the examples are from the NVIDIA GPU Cloud (NGC) repository, and certain models from NGC require active subscription. Please download and install the NGC CLI from the NGC page and and follow the NGC CLI Guide to set up the tool.

The examples are organized based on the backend they are using, and the steps to run the examples are different for each backend. Please click on the sample name to see the detailed instructions.

Note

While Inference Builder works with Ampere, Hopper, and Blackwell architectures, the examples’ model and backend choices set the real hardware requirements. For example, Qwen2.5-7B-Instruct model with TensorRT-LLM backend requires very high GPU memory and can only run on H100 and B200.

DeepStream Backend Examples#
Name	Description	Models	Backend	Output
ds_app	Deepstream Applications built with the Inference Builder	TAO Computer Vision models	Deepstream	command line interface application
tao	Inference Microservices for Nvidia TAO Computer Vision	TAO Computer Vision models	Deepstream	microservice

Triton Backend Examples#
Name	Description	Models	Backend	Output
changenet	An inference microservice using the TAO Visual-Changenet model for segmentation	Visual ChangeNet	Triton/TensorRT	microservice

TensorRT Backend Examples#
Name	Description	Models	Backend	Output
nvclip	A microservice to support visual and text embeddings	NVCLIP	TensorRT	microservice

TensorRT-LLM Backend Examples#
Name	Description	Models	Backend	Output
qwen	A microservice for QWen multi-modal models	Qwen 2.5 VL models	TensorRT-LLM,Pytorch	microservice

Metropolis Computer Vision Inference Microservice#

In tao examples, we demonstrate how to build Metropolis Computer Vision Inference Microservices with Inference Builder for TAO CV models, and how to use them to perform inference on images and videos. Please refer to the HOWTO.md for detailed instructions.