nvJitLink

The User guide to nvJitLink library.

1. Introduction

The JIT Link APIs are a set of APIs which can be used at runtime to link together GPU devide code.

The APIs accept inputs in multiple formats, either host objects, host libraries, fatbins (including with relocatable ptx), device cubins, PTX, index files or LTO-IR. The output is a linked cubin that can be loaded by cuModuleLoadData and cuModuleLoadDataEx of the CUDA Driver API.

Link Time Optimization can also be performed when given LTO-IR or higher level formats that include LTO-IR.

If an input does not contain GPU assembly code, it is first compiled and then linked.

The functionality in this library is similar to the cuLink* APIs in the CUDA Driver, with the following advantages:

  • The cuLink* APIs have been deprecated for use with LTO-IR

  • Support for Link Time Optimization

  • Allow users to use runtime linking with the latest Toolkit version that is supported as part of CUDA Toolkit release. This support may not be available in the CUDA Driver APIs if the application is running with an older driver installed in the system. Refer to CUDA Compatibility for more details.

  • The clients get fine grain control and can specify low-level compiler options during linking.

2. Getting Started

2.1. System Requirements

The JIT Link library requires the following system configuration:

  • POSIX threads support for non-Windows platform.

  • GPU: Any GPU with CUDA Compute Capability 3.5 or higher.

  • CUDA Toolkit and Driver.

2.2. Installation

The JIT Link library is part of the CUDA Toolkit release and the components are organized as follows in the CUDA toolkit installation directory:

  • On Windows:

    • include\nvJitLink.h

    • lib\x64\nvJitLink.dll

    • lib\x64\nvJitLink_static.lib

    • doc\pdf\nvJitLink_User_Guide.pdf

  • On Linux:

    • include/nvJitLink.h

    • lib64/libnvJitLink.so

    • lib64/libnvJitLink_static.a

    • doc/pdf/nvJitLink_User_Guide.pdf

3. User Interface

This chapter presents the JIT Link APIs. Basic usage of the API is explained in Basic Usage.

3.1. Error codes

Enumerations

nvJitLinkResult

The enumerated type nvJitLinkResult defines API call result codes.

3.1.1. Enumerations

enum nvJitLinkResult

The enumerated type nvJitLinkResult defines API call result codes.

nvJitLink APIs return nvJitLinkResult codes to indicate the result.

Values:

3.2. Linking

Enumerations

nvJitLinkInputType

The enumerated type nvJitLinkInputType defines the kind of inputs that can be passed to nvJitLinkAdd* APIs.

Functions

nvJitLinkResult nvJitLinkAddData(nvJitLinkHandle handle, nvJitLinkInputType inputType, const void *data, size_t size, const char *name)

nvJitLinkAddData adds data image to the link.

nvJitLinkResult nvJitLinkAddFile(nvJitLinkHandle handle, nvJitLinkInputType inputType, const char *fileName)

nvJitLinkAddFile reads data from file and links it in.

nvJitLinkResult nvJitLinkComplete(nvJitLinkHandle handle)

nvJitLinkComplete does the actual link.

nvJitLinkResult nvJitLinkCreate(nvJitLinkHandle *handle, uint32_t numOptions, const char **options)

nvJitLinkCreate creates an instance of nvJitLinkHandle with the given input options, and sets the output parameter handle .

nvJitLinkResult nvJitLinkDestroy(nvJitLinkHandle *handle)

nvJitLinkDestroy frees the memory associated with the given handle and sets it to NULL.

nvJitLinkResult nvJitLinkGetErrorLog(nvJitLinkHandle handle, char *log)

nvJitLinkGetErrorLog puts any error messages in the log.

nvJitLinkResult nvJitLinkGetErrorLogSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetErrorLogSize gets the size of the error log.

nvJitLinkResult nvJitLinkGetInfoLog(nvJitLinkHandle handle, char *log)

nvJitLinkGetInfoLog puts any info messages in the log.

nvJitLinkResult nvJitLinkGetInfoLogSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetInfoLogSize gets the size of the info log.

nvJitLinkResult nvJitLinkGetLinkedCubin(nvJitLinkHandle handle, void *cubin)

nvJitLinkGetLinkedCubin gets the linked cubin.

nvJitLinkResult nvJitLinkGetLinkedCubinSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetLinkedCubinSize gets the size of the linked cubin.

nvJitLinkResult nvJitLinkGetLinkedPtx(nvJitLinkHandle handle, char *ptx)

nvJitLinkGetLinkedPtx gets the linked ptx.

nvJitLinkResult nvJitLinkGetLinkedPtxSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetLinkedPtxSize gets the size of the linked ptx.

nvJitLinkResult nvJitLinkVersion(unsigned int *major, unsigned int *minor)

nvJitLinkVersion returns the current version of nvJitLink.

Typedefs

nvJitLinkHandle

nvJitLinkHandle is the unit of linking, and an opaque handle for a program.

3.2.1. Enumerations

enum nvJitLinkInputType

The enumerated type nvJitLinkInputType defines the kind of inputs that can be passed to nvJitLinkAdd* APIs.

Values:

3.2.2. Functions

static inline nvJitLinkResult nvJitLinkAddData(nvJitLinkHandle handle, nvJitLinkInputType inputType, const void *data, size_t size, const char *name)

nvJitLinkAddData adds data image to the link.

Parameters
  • handle[in] nvJitLink handle.

  • inputType[in] kind of input.

  • data[in] pointer to data image in memory.

  • size[in] size of the data.

  • name[in] name of input object.

Returns

static inline nvJitLinkResult nvJitLinkAddFile(nvJitLinkHandle handle, nvJitLinkInputType inputType, const char *fileName)

nvJitLinkAddFile reads data from file and links it in.

Parameters
  • handle[in] nvJitLink handle.

  • inputType[in] kind of input.

  • fileName[in] name of file.

Returns

static inline nvJitLinkResult nvJitLinkComplete(nvJitLinkHandle handle)

nvJitLinkComplete does the actual link.

Parameters

handle[in] nvJitLink handle.

Returns

static inline nvJitLinkResult nvJitLinkCreate(nvJitLinkHandle *handle, uint32_t numOptions, const char **options)

nvJitLinkCreate creates an instance of nvJitLinkHandle with the given input options, and sets the output parameter handle.

It supports options listed in Supported Link Options.

See also

nvJitLinkDestroy

Parameters
  • handle[out] Address of nvJitLink handle.

  • numOptions[in] Number of options passed.

  • options[in] Array of size numOptions of option strings.

Returns

static inline nvJitLinkResult nvJitLinkDestroy(nvJitLinkHandle *handle)

nvJitLinkDestroy frees the memory associated with the given handle and sets it to NULL.

See also

nvJitLinkCreate

Parameters

handle[in] Address of nvJitLink handle.

Returns

static inline nvJitLinkResult nvJitLinkGetErrorLog(nvJitLinkHandle handle, char *log)

nvJitLinkGetErrorLog puts any error messages in the log.

User is responsible for allocating enough space to hold the log.

See also

nvJitLinkGetErrorLogSize

Parameters
  • handle[in] nvJitLink handle.

  • log[out] The error log.

Returns

static inline nvJitLinkResult nvJitLinkGetErrorLogSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetErrorLogSize gets the size of the error log.

See also

nvJitLinkGetErrorLog

Parameters
  • handle[in] nvJitLink handle.

  • size[out] Size of the error log.

Returns

static inline nvJitLinkResult nvJitLinkGetInfoLog(nvJitLinkHandle handle, char *log)

nvJitLinkGetInfoLog puts any info messages in the log.

User is responsible for allocating enough space to hold the log.

See also

nvJitLinkGetInfoLogSize

Parameters
  • handle[in] nvJitLink handle.

  • log[out] The info log.

Returns

static inline nvJitLinkResult nvJitLinkGetInfoLogSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetInfoLogSize gets the size of the info log.

See also

nvJitLinkGetInfoLog

Parameters
  • handle[in] nvJitLink handle.

  • size[out] Size of the info log.

Returns

static inline nvJitLinkResult nvJitLinkGetLinkedCubin(nvJitLinkHandle handle, void *cubin)

nvJitLinkGetLinkedCubin gets the linked cubin.

User is responsible for allocating enough space to hold the cubin.

See also

nvJitLinkGetLinkedCubinSize

Parameters
  • handle[in] nvJitLink handle.

  • cubin[out] The linked cubin.

Returns

static inline nvJitLinkResult nvJitLinkGetLinkedCubinSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetLinkedCubinSize gets the size of the linked cubin.

See also

nvJitLinkGetLinkedCubin

Parameters
  • handle[in] nvJitLink handle.

  • size[out] Size of the linked cubin.

Returns

static inline nvJitLinkResult nvJitLinkGetLinkedPtx(nvJitLinkHandle handle, char *ptx)

nvJitLinkGetLinkedPtx gets the linked ptx.

Linked PTX is only available when using the -lto option. User is responsible for allocating enough space to hold the ptx.

See also

nvJitLinkGetLinkedPtxSize

Parameters
  • handle[in] nvJitLink handle.

  • ptx[out] The linked PTX.

Returns

static inline nvJitLinkResult nvJitLinkGetLinkedPtxSize(nvJitLinkHandle handle, size_t *size)

nvJitLinkGetLinkedPtxSize gets the size of the linked ptx.

Linked PTX is only available when using the -lto option.

See also

nvJitLinkGetLinkedPtx

Parameters
  • handle[in] nvJitLink handle.

  • size[out] Size of the linked PTX.

Returns

nvJitLinkResult nvJitLinkVersion(unsigned int *major, unsigned int *minor)

nvJitLinkVersion returns the current version of nvJitLink.

Parameters
  • major[out] The major version.

  • minor[out] The minor version.

Returns

3.2.3. Typedefs

typedef struct nvJitLink *nvJitLinkHandle

nvJitLinkHandle is the unit of linking, and an opaque handle for a program.

To link inputs, an instance of nvJitLinkHandle must be created first with nvJitLinkCreate().

4. Basic Usage

This section of the document uses a simple example to explain how to use the JIT Link APIs to link a program. For brevity and readability, error checks on the API return values are not shown.

This example assumes we want to link for sm_80, but whatever arch is installed on the system should be used. We can create the linker and obtain a handle to it as shown in Figure 1.

Figure 1. Linker creation and initialization of a program

nvJitLink_t linker;
const char* link_options[] = { "-arch=sm_80" };
nvJitLinkCreate(&linker, 1, link_options);

Assume that we already have two relocatable input files (a.o and b.o), which could be created with the nvcc -dc command. We can add the input files as show in Figure 2.

nvJitLinkAddFile(linker, NVJITLINK_INPUT_OBJECT, "a.o");
nvJitLinkAddFile(linker, NVJITLINK_INPUT_OBJECT, "b.o");

Now the actual link can be done as shown in Figure 3.

Figure 3. Linking of the PTX program

nvJitLinkComplete(linker);

The linked GPU assembly code can now be obtained. To obtain this we first allocate memory for it. And to allocate memory, we need to query the size of the image of the linked GPU assembly code which is done as shown in Figure 4.

Figure 4. Query size of the linked assembly image

nvJitLinkGetLinkedCubinSize(linker, &cubinSize);

The image of the linked GPU assembly code can now be queried as shown in Figure 5. This image can then be executed on the GPU by passing this image to the CUDA Driver APIs.

Figure 5. Query the linked assembly image

elf = (char*) malloc(cubinSize);
nvJitLinkGetLinkedCubin(linker, (void*)elf);

When the linker is not needed anymore, it can be destroyed as shown in Figure 6.

Figure 6. Destroy the linker

nvJitLinkDestroy(&linker);

5. Compatibility

The nvJitLink library is compatible across minor versions in a release, but may not be compatible across major versions. The library version itself must be >= the maximum version of the inputs, and the shared library version must be >= the version that was linked with.

For example, you can link an object created with 12.0 and one with 12.1 if your nvJitLink library is version 12.x where x >= 1. If it was linked with 12.1, then you can replace and use the nvJitLink shared library with any version 12.x where x >= 1. On the flip side, you cannot use 12.0 to link 12.1 objects, nor use 12.0 nvJitLink library to run 12.1 code.

Linking across major versions (like 11.x with 12.x) works for ELF and PTX inputs, but does not work with LTOIR inputs. If using LTO, then compatibility is only guaranteed within a major release.

Linking extended ISA sources (like sm_90a) against any other sm version will always fail.

Linking with PTX sources from different architectures (such as compute_89 and compute_90) will work as long as the final link is the newest of all of the architectures being linked. That is, for any compute_X and compute_Y, the link is valid if the target is sm_N where N >= max(X,Y).

Linking with LTO sources from different architectures (such as lto_89 and lto_90) will work as long as the final link is the newest of all of the architectures being linked. That is, for any lto_X and lto_Y, the link is valid if the target is sm_N where N >= max(X,Y).

Linking with non-PTX, non-LTO sources is limited to link-compatible architectures, such as how sm_70 and sm_75 can link with each other but not sm_80.