ONNX Export & Custom Models¶
Export FactCG to ONNX¶
Director-AI ships with export_onnx() in director_ai.core.nli:
from director_ai.core.nli import export_onnx
export_onnx(
model_name="yaxili96/FactCG-DeBERTa-v3-Large",
output_dir="models/factcg_onnx",
)
This uses Optimum to convert the
PyTorch model + tokenizer to ONNX format. Requires pip install optimum[onnxruntime].
Use the ONNX model¶
from director_ai.core.nli import NLIScorer
scorer = NLIScorer(
backend="onnx",
onnx_path="models/factcg_onnx",
device="cuda", # or "cpu"
)
The ONNX backend selects execution providers automatically:
| Provider | Env / Condition | Latency |
|---|---|---|
| TensorrtExecutionProvider | DIRECTOR_ENABLE_TRT=1 + libnvinfer |
Sub-10 ms target |
| CUDAExecutionProvider | onnxruntime-gpu installed |
14.6 ms/pair (GTX 1060) |
| CPUExecutionProvider | Fallback | 383 ms/pair |
GPU Docker image (pre-exported)¶
The Dockerfile.gpu multi-stage build exports the model at build time:
docker build -f Dockerfile.gpu -t director-ai:gpu .
docker run --gpus all -p 8080:8080 director-ai:gpu
The ONNX model is baked into /app/models/onnx/ — no HuggingFace
downloads at runtime.
Custom models¶
Any HuggingFace AutoModelForSequenceClassification works:
from director_ai.core.nli import export_onnx, NLIScorer
# Export your fine-tuned model
export_onnx(
model_name="your-org/your-nli-model",
output_dir="models/custom_onnx",
)
# Use it
scorer = NLIScorer(
backend="onnx",
onnx_path="models/custom_onnx",
)
The scorer auto-detects 2-class (FactCG-style) vs 3-class (standard NLI) models and adjusts the entailment probability extraction.
Graph optimization¶
Set ORT_ENABLE_ALL=1 (default in Director-AI) for operator fusion,
constant folding, and layout optimization. This is already configured
in the NLIScorer ONNX path.
Pinned dependencies¶
For reproducible exports: