Skip to content

ONNX Artefacts

Use ONNX when the deployment needs local NLI scoring without loading the PyTorch backend at request time. The Python-only quickstart still remains the default path; ONNX is an opt-in runtime path.

Export Path

Install the runtime stack, add the hash-pinned export wheels from the repo, and write the model into the quickstart path:

pip install "director-ai[nli,onnx]"
pip install --no-deps --require-hashes -r requirements/docker-gpu-export.txt
director-ai export --format onnx --output models/factcg-onnx

The GPU Docker build uses the same export wheel file during its model-builder stage, then copies only the exported model into the runtime image.

The expected directory is:

models/factcg-onnx/
  model.onnx
  config.json
  tokenizer.json
  tokenizer_config.json
  special_tokens_map.json

For CPU-only serving, install the ONNX Runtime extra:

pip install "director-ai[nli,onnx]"

For NVIDIA CUDA serving, install the GPU runtime extra:

pip install "director-ai[nli,tensorrt]"

TensorRT execution also requires matching NVIDIA TensorRT libraries on the host or image. The package extra supplies onnxruntime-gpu; it does not vendor driver or TensorRT system libraries.

Quickstart Compose

director-ai quickstart creates models/factcg-onnx/ and an opt-in Compose profile. Export or copy the files into that directory, then run:

docker compose --profile onnx up director-proxy-onnx

The container mounts the model directory read-only and sets:

DIRECTOR_SCORER_BACKEND=onnx
DIRECTOR_ONNX_PATH=/models/factcg-onnx

Wheel Target Matrix

requirements/onnx_wheel_targets.toml is the tracked source for these targets. requirements/docker-gpu-export.txt pins the export-only wheels.

Target id Platform Extra Runtime package Execution provider Status
linux-cpu-x86_64 Linux x86_64 [onnx] onnxruntime CPUExecutionProvider covered
linux-nvidia-cuda-x86_64 Linux x86_64 with NVIDIA driver [tensorrt] onnxruntime-gpu CUDAExecutionProvider covered
linux-nvidia-tensorrt-x86_64 Linux x86_64 with NVIDIA TensorRT libraries [tensorrt] onnxruntime-gpu TensorrtExecutionProvider covered with system TensorRT libraries
macos-cpu-arm64 macOS arm64 [onnx] onnxruntime CPUExecutionProvider covered
windows-cpu-amd64 Windows amd64 [onnx] onnxruntime CPUExecutionProvider covered

ROCm, DirectML, and other provider wheels are not exposed as project extras until their install path is verified in CI. Use CPU ONNX on those hosts unless a deployment owns the provider wheel and system library stack.

Runtime Check

After installing the target extra, verify the provider list:

director-ai doctor
python -c "import onnxruntime as ort; print(ort.get_available_providers())"

The expected provider should appear before the CPU fallback for GPU targets. If it does not, keep the deployment on [onnx] until the driver and runtime libraries match.