Inference Builder Overview#
Inference Builder is an open-source tool that automates the generation of end-to-end inference pipelines across multiple AI frameworks. It supports seamless integration of custom logic and can package the entire pipeline as a deployable container image.
To generate an inference pipeline, you only need to provide a single YAML configuration file to define the pipeline, including model selection, input/output formats, backend inference engines, and any custom preprocessing or postprocessing code. The output is either a standalone Python application, or a microservice when a server type is appropriately specified and an OpenAPI spec is provided.
Note
This document offers a quick overview for Inference Builder. For detailed information about the usage of Inference Builder, how it works, the features it supports, etc., please refer to the Inference Builder README page.
Inference Builder Setup#
The recommended OS for running the Inference Builder is Ubuntu Desktop 24.04 with python 3.12.
Clone the Inference Builder repo.
$ git clone https://github.com/NVIDIA-AI-IOT/inference_builder $ cd inference_builder $ git submodule update --init --recursive
Install dependencies.
$ sudo apt update $ sudo apt install -y protobuf-compiler python3.12-dev python3.12-venv $ python -m venv .venv $ source .venv/bin/activate $ pip3 install -r requirements.txt
Getting Started with the Examples#
Under builder/samples/ directory, you can find all the examples to start with. Please refer to the README.md file in each individual directory for detailed instructions.
Before starting with the examples, you need to have a proper docker environment. Below packages are required to build and run the container images:
Docker
Docker Compose
NVIDIA Container Toolkit
Ensure nvidia runtime added to /etc/docker/daemon.json to run GPU-enabled containers
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
The docker group must exist in your system, please check if it has been created using “getent group docker”. Your current user must belong to the docker group, If not, run the command below, then log out and back in for the group change to take effect.
sudo usermod -aG docker $USER
Some of the models used in the examples are from the NVIDIA GPU Cloud (NGC) repository, and certain models from NGC require active subscription. Please download and install the NGC CLI from the NGC page and and follow the NGC CLI Guide to set up the tool.
The examples are organized based on the backend they are using, and the steps to run the examples are different for each backend. Please click on the sample name to see the detailed instructions.
Note
While Inference Builder works with Ampere, Hopper, and Blackwell architectures, the examples’ model and backend choices set the real hardware requirements. For example, Qwen2.5-7B-Instruct model with TensorRT-LLM backend requires very high GPU memory and can only run on H100 and B200.
Name |
Description |
Models |
Backend |
Output |
---|---|---|---|---|
Deepstream Applications built with the Inference Builder |
TAO Computer Vision models |
Deepstream |
command line interface application |
|
Inference Microservices for Nvidia TAO Computer Vision |
TAO Computer Vision models |
Deepstream |
microservice |
|
Name |
Description |
Models |
Backend |
Output |
---|---|---|---|---|
An inference microservice using the TAO Visual-Changenet model for segmentation |
Visual ChangeNet |
Triton/TensorRT |
microservice |
|
Name |
Description |
Models |
Backend |
Output |
---|---|---|---|---|
A microservice to support visual and text embeddings |
NVCLIP |
TensorRT |
microservice |
|
Metropolis Computer Vision Inference Microservice#
In tao examples, we demonstrate how to build Metropolis Computer Vision Inference Microservices with Inference Builder for TAO CV models, and how to use them to perform inference on images and videos. Please refer to the HOWTO.md for detailed instructions.