Why Director-AI¶

For the broader product, application, and market map, start with Product Overview. This page focuses on the technical reason Director-AI exists: generated text can become visible before post-hoc checks run.

The Streaming Problem¶

Every major LLM provider defaults to streaming. OpenAI, Anthropic, Google — they all send tokens as they are generated. Users see the response character by character.

Post-hoc guardrails check after generation completes. By then, the user may already have read the unsupported claim: a wrong medication dosage displayed for 3 seconds, a fabricated legal citation copied into a brief, or an incorrect refund policy quoted to a customer.

The industry standard — generate first, check later — is the wrong UX boundary for fact-critical streams.

What Director-AI Does Differently¶

Director-AI makes factual coherence a control point before output is accepted, stored, routed, or acted on.

Claim-level streaming halt. The production streaming signal is contradiction-driven: completed streamed claims are checked against retrieved facts and the stream halts when a claim contradicts governed knowledge. The latest local benchmark artifact (benchmarks/results/streaming_contradiction_halt_base.json) reports 2/135 false halts and 2/3 caught contradiction passages on the small streaming suite; broader held-out contradiction evidence is recorded in benchmarks/results/contradiction_holdout_finetuned.json.

Dual-entropy scoring. Two independent signals:

H_logical — NLI contradiction detection via DeBERTa (0.4B params). Catches logical inconsistencies between the response and your facts.
H_factual — RAG retrieval against your knowledge base. Catches claims that have no supporting evidence.

The final score combines both: coherence = 1 - (0.6 * H_logical + 0.4 * H_factual).

Evidence on rejection. Every halt includes the specific KB chunks that contradicted the response. No black-box "this was flagged" — your users or QA team see exactly why.

0.4B parameters, sub-millisecond latency. FactCG-DeBERTa-v3-Large runs at 0.5 ms/pair on an L40S (FP16, batch=32). No API calls, no metering, no rate limits.

When NOT to Use Director-AI¶

Director-AI solves one problem: factual coherence — does the LLM output match your ground truth?

It does not handle:

Problem	Use Instead
Toxicity / hate speech	NeMo Guardrails, LLM-Guard
Prompt injection (input-side only)	Rebuff, LLM-Guard — though Director-AI now includes `InjectionDetector` for output-side NLI-based detection
PII leakage	Presidio, LLM-Guard
Content moderation	OpenAI Moderation API, Llama Guard
Code safety	Semgrep, Snyk Code

You can (and should) combine Director-AI with these tools. Director-AI guards facts; the others guard behaviour.

Decision Matrix¶

Your Situation	Recommendation
RAG chatbot with a knowledge base	Director-AI with `VectorGroundTruthStore` — KB Ingestion guide
Streaming LLM responses to users	Director-AI contradiction-driven `StreamingKernel` — Streaming guide
LLM agent making multi-step decisions	Director-AI `CoherenceAgent` — API reference
Customer support bot with product facts	Director-AI with domain-specific KB — Support cookbook
Medical / legal / finance compliance	Director-AI with curated KB plus a tuned profile; stock regulated profiles are calibration starting points
Toxicity filtering only	NeMo Guardrails or LLM-Guard instead
Prompt injection defence only	Rebuff or LLM-Guard instead

Cost Comparison¶

System	Cost per 1K calls	Latency	Local/Offline
Director-AI (NLI mode)	$0	0.5 ms (L40S)	Yes
Director-AI (hybrid + GPT-4o-mini)	$0.07	2.3 s	No
Director-AI (hybrid + Claude Sonnet)	$1.40	14.2 s	No
GPT-4o as judge	$1.16	~2 s	No
Claude Haiku 4.5 as judge	$0.37	~1.5 s	No
GuardrailsAI (LLM-as-judge)	LLM cost	2.26 s	No
SelfCheckGPT (multi-call)	3-5x LLM cost	5-10 s	No

NLI-only mode is free, fast, and fully offline. Add an LLM judge only if you need the 90.7% hybrid catch rate — and even then, GPT-4o-mini matches Claude at 6x lower cost.

Next: Quickstart — score your first response in 2 minutes.