Skip to content

Why Director-AI

The Streaming Problem

Every major LLM provider defaults to streaming. OpenAI, Anthropic, Google — they all send tokens as they're generated. Users see the response character by character.

Post-hoc guardrails check after generation completes. By then, the user already read the hallucination. The damage is done: a wrong medication dosage displayed for 3 seconds, a fabricated legal citation copied into a brief, an incorrect refund policy quoted to an angry customer.

The industry standard — generate first, check later — is a UX failure.

What Director-AI Does Differently

Director-AI scores coherence as tokens arrive, not after the full response is assembled.

Token-level halt. StreamingKernel evaluates every N tokens against your knowledge base. If coherence drops below threshold mid-stream, generation stops immediately. The user never sees the hallucinated content.

Dual-entropy scoring. Two independent signals:

  • H_logical — NLI contradiction detection via DeBERTa (0.4B params). Catches logical inconsistencies between the response and your facts.
  • H_factual — RAG retrieval against your knowledge base. Catches claims that have no supporting evidence.

The final score combines both: coherence = 1 - (0.6 * H_logical + 0.4 * H_factual).

Evidence on rejection. Every halt includes the specific KB chunks that contradicted the response. No black-box "this was flagged" — your users (or your QA team) see exactly why.

0.4B parameters, sub-millisecond latency. FactCG-DeBERTa-v3-Large runs at 0.5 ms/pair on an L40S (FP16, batch=32). No API calls, no metering, no rate limits.

When NOT to Use Director-AI

Director-AI solves one problem: factual coherence — does the LLM output match your ground truth?

It does not handle:

Problem Use Instead
Toxicity / hate speech NeMo Guardrails, LLM-Guard
Prompt injection Rebuff, LLM-Guard
PII leakage Presidio, LLM-Guard
Content moderation OpenAI Moderation API, Llama Guard
Code safety Semgrep, Snyk Code

You can (and should) combine Director-AI with these tools. Director-AI guards facts; the others guard behaviour.

Decision Matrix

Your Situation Recommendation
RAG chatbot with a knowledge base Director-AI with VectorGroundTruthStoreKB Ingestion guide
Streaming LLM responses to users Director-AI StreamingKernelStreaming guide
LLM agent making multi-step decisions Director-AI CoherenceAgentAPI reference
Customer support bot with product facts Director-AI with domain-specific KB — Support cookbook
Medical / legal / finance compliance Director-AI with high threshold (0.7+) — domain cookbooks
Toxicity filtering only NeMo Guardrails or LLM-Guard instead
Prompt injection defence only Rebuff or LLM-Guard instead

Cost Comparison

System Cost per 1K calls Latency Local/Offline
Director-AI (NLI mode) $0 0.5 ms (L40S) Yes
Director-AI (hybrid + GPT-4o-mini) $0.07 2.3 s No
Director-AI (hybrid + Claude Sonnet) $1.40 14.2 s No
GPT-4o as judge $1.16 ~2 s No
Claude Haiku 4.5 as judge $0.37 ~1.5 s No
GuardrailsAI (LLM-as-judge) LLM cost 2.26 s No
SelfCheckGPT (multi-call) 3-5x LLM cost 5-10 s No

NLI-only mode is free, fast, and fully offline. Add an LLM judge only if you need the 90.7% hybrid catch rate — and even then, GPT-4o-mini matches Claude at 6x lower cost.


Next: Quickstart — score your first response in 2 minutes.