Why Director-AI¶

The Streaming Problem¶

Every major LLM provider defaults to streaming. OpenAI, Anthropic, Google — they all send tokens as they're generated. Users see the response character by character.

Post-hoc guardrails check after generation completes. By then, the user already read the hallucination. The damage is done: a wrong medication dosage displayed for 3 seconds, a fabricated legal citation copied into a brief, an incorrect refund policy quoted to an angry customer.

The industry standard — generate first, check later — is a UX failure.

What Director-AI Does Differently¶

Director-AI scores coherence as tokens arrive, not after the full response is assembled.

Token-level halt. StreamingKernel evaluates every N tokens against your knowledge base. If coherence drops below threshold mid-stream, generation stops immediately. The user never sees the hallucinated content.

Dual-entropy scoring. Two independent signals:

H_logical — NLI contradiction detection via DeBERTa (0.4B params). Catches logical inconsistencies between the response and your facts.
H_factual — RAG retrieval against your knowledge base. Catches claims that have no supporting evidence.

The final score combines both: coherence = 1 - (0.6 * H_logical + 0.4 * H_factual).

Evidence on rejection. Every halt includes the specific KB chunks that contradicted the response. No black-box "this was flagged" — your users (or your QA team) see exactly why.

0.4B parameters, sub-millisecond latency. FactCG-DeBERTa-v3-Large runs at 0.5 ms/pair on an L40S (FP16, batch=32). No API calls, no metering, no rate limits.

When NOT to Use Director-AI¶

Director-AI solves one problem: factual coherence — does the LLM output match your ground truth?

It does not handle:

Problem	Use Instead
Toxicity / hate speech	NeMo Guardrails, LLM-Guard
Prompt injection	Rebuff, LLM-Guard
PII leakage	Presidio, LLM-Guard
Content moderation	OpenAI Moderation API, Llama Guard
Code safety	Semgrep, Snyk Code

You can (and should) combine Director-AI with these tools. Director-AI guards facts; the others guard behaviour.

Decision Matrix¶

Your Situation	Recommendation
RAG chatbot with a knowledge base	Director-AI with `VectorGroundTruthStore` — KB Ingestion guide
Streaming LLM responses to users	Director-AI `StreamingKernel` — Streaming guide
LLM agent making multi-step decisions	Director-AI `CoherenceAgent` — API reference
Customer support bot with product facts	Director-AI with domain-specific KB — Support cookbook
Medical / legal / finance compliance	Director-AI with high threshold (0.7+) — domain cookbooks
Toxicity filtering only	NeMo Guardrails or LLM-Guard instead
Prompt injection defence only	Rebuff or LLM-Guard instead

Cost Comparison¶

System	Cost per 1K calls	Latency	Local/Offline
Director-AI (NLI mode)	$0	0.5 ms (L40S)	Yes
Director-AI (hybrid + GPT-4o-mini)	$0.07	2.3 s	No
Director-AI (hybrid + Claude Sonnet)	$1.40	14.2 s	No
GPT-4o as judge	$1.16	~2 s	No
Claude Haiku 4.5 as judge	$0.37	~1.5 s	No
GuardrailsAI (LLM-as-judge)	LLM cost	2.26 s	No
SelfCheckGPT (multi-call)	3-5x LLM cost	5-10 s	No

NLI-only mode is free, fast, and fully offline. Add an LLM judge only if you need the 90.7% hybrid catch rate — and even then, GPT-4o-mini matches Claude at 6x lower cost.

Next: Quickstart — score your first response in 2 minutes.