ONNX Artefacts¶
Use ONNX when the deployment needs local NLI scoring without loading the PyTorch backend at request time. The Python-only quickstart still remains the default path; ONNX is an opt-in runtime path.
Export Path¶
Install the runtime stack, add the hash-pinned export wheels from the repo, and write the model into the quickstart path:
pip install "director-ai[nli,onnx]"
pip install --no-deps --require-hashes -r requirements/docker-gpu-export.txt
director-ai export --format onnx --output models/factcg-onnx
The GPU Docker build uses the same export wheel file during its model-builder stage, then copies only the exported model into the runtime image.
The expected directory is:
models/factcg-onnx/
model.onnx
config.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json
For CPU-only serving, install the ONNX Runtime extra:
For NVIDIA CUDA serving, install the GPU runtime extra:
TensorRT execution also requires matching NVIDIA TensorRT libraries on the
host or image. The package extra supplies onnxruntime-gpu; it does not vendor
driver or TensorRT system libraries.
Quickstart Compose¶
director-ai quickstart creates models/factcg-onnx/ and an opt-in Compose
profile. Export or copy the files into that directory, then run:
The container mounts the model directory read-only and sets:
Wheel Target Matrix¶
requirements/onnx_wheel_targets.toml is the tracked source for these targets.
requirements/docker-gpu-export.txt pins the export-only wheels.
| Target id | Platform | Extra | Runtime package | Execution provider | Status |
|---|---|---|---|---|---|
linux-cpu-x86_64 |
Linux x86_64 | [onnx] |
onnxruntime |
CPUExecutionProvider |
covered |
linux-nvidia-cuda-x86_64 |
Linux x86_64 with NVIDIA driver | [tensorrt] |
onnxruntime-gpu |
CUDAExecutionProvider |
covered |
linux-nvidia-tensorrt-x86_64 |
Linux x86_64 with NVIDIA TensorRT libraries | [tensorrt] |
onnxruntime-gpu |
TensorrtExecutionProvider |
covered with system TensorRT libraries |
macos-cpu-arm64 |
macOS arm64 | [onnx] |
onnxruntime |
CPUExecutionProvider |
covered |
windows-cpu-amd64 |
Windows amd64 | [onnx] |
onnxruntime |
CPUExecutionProvider |
covered |
ROCm, DirectML, and other provider wheels are not exposed as project extras until their install path is verified in CI. Use CPU ONNX on those hosts unless a deployment owns the provider wheel and system library stack.
Runtime Check¶
After installing the target extra, verify the provider list:
The expected provider should appear before the CPU fallback for GPU targets.
If it does not, keep the deployment on [onnx] until the driver and runtime
libraries match.