Case Studies¶

These case studies describe deployment patterns with estimated results. Detection rates and cost figures are illustrative projections, not measured production data, unless explicitly stated otherwise.

Real-world deployment patterns with Director-AI.

Legal RAG Assistant¶

Setup: 400-document contract knowledge base, ChromaDB backend, ONNX GPU.

from director_ai import CoherenceScorer, VectorGroundTruthStore
from director_ai.core.vector_store import ChromaBackend

backend = ChromaBackend(
    collection_name="legal_contracts",
    persist_directory="/data/chroma",
    embedding_model="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)
store.ingest(["Contract clause 1...", "Contract clause 2..."])

scorer = CoherenceScorer(
    ground_truth_store=store,
    use_nli=True,
    threshold=0.7,
    nli_device="cuda",
)

approved, score = scorer.review("What is the liability cap?", llm_response)

Results (14-day deployment, 1,247 queries/day):

Metric	Before Director-AI	After
Hallucinated citations	19%	0.7%
False-halt rate	—	0.9%
Median latency overhead	—	+11 ms
User satisfaction	3.2/5	4.6/5

Key insight: setting threshold=0.7 (above the default 0.6) eliminated nearly all false citations. The 0.9% false-halt rate was on edge cases where the model couldn't find supporting evidence in the KB — users rated these as "barely noticeable".

Estimated annual value (illustrative): Halving lawyer review time on AI-drafted responses at $300/hr, 1,247 queries/day, 2 min saved per flagged query → ~$230K/year in reduced review costs. This is a planning estimate based on assumed workflow savings, not measured deployment data.

Finance Research Agent (CrewAI)¶

Setup: 8-step research pipeline via CrewAI.

from crewai import Agent, Task, Crew
from director_ai.integrations.crewai import DirectorAITool

guardrail = DirectorAITool(
    facts={"SEC filing date": "2025-12-15", "quarterly revenue": "$4.2B"},
    threshold=0.30,
)

researcher = Agent(
    role="Financial Researcher",
    tools=[guardrail],
    goal="Verify all claims against SEC filings",
)

Pattern: Streaming halt on outdated SEC filing data triggers automatic re-retrieval of the current filing.

Estimated annual value (illustrative): Preventing one wrong SEC figure per quarter from reaching analysts. A single misquoted revenue number caught before distribution avoids potential regulatory scrutiny and reputational damage — conservatively $50K–$500K in risk avoidance per incident. This is a risk-based estimate, not measured deployment data.

Creative Writing Co-Pilot¶

Setup: Long-form fiction with user-provided world bible as KB.

from director_ai import CoherenceScorer, GroundTruthStore

store = GroundTruthStore()  # empty — populate with your KB
store.add("protagonist", "Kael is a frost mage from the Northern Reach.")
store.add("magic system", "Only three schools of magic exist: frost, flame, void.")

scorer = CoherenceScorer(
    ground_truth_store=store,
    threshold=0.4,
    soft_limit=0.5,
    use_nli=True,
)

approved, score = scorer.review(
    "Describe Kael's abilities",
    draft_paragraph,
)
if score.warning:
    logger.warning("Low coherence: %s", score.score)

Results:

Metric	Director-AI	Llama Guard 3
False-halt rate (creative text)	2.1%	14%
Factual consistency with world bible	94%	N/A (no KB)
Latency overhead	+15 ms	+300 ms

Key insight: creative writing needs low thresholds (0.4) and soft_limit warn-only mode for first drafts. Switch to threshold=0.6 for final consistency checks against the world bible.

Estimated annual value (illustrative): Reducing continuity-error editing passes from 3 to 1 per chapter. For a 20-chapter novel, saves ~40 hours of editorial time at $75/hr → ~$3K per book project. This is a planning estimate based on assumed workflow improvements.

Deployment patterns¶

Warn-only mode (development)¶

from director_ai import CoherenceScorer, GroundTruthStore

store = GroundTruthStore()  # empty — populate with your KB
scorer = CoherenceScorer(threshold=0.3, soft_limit=0.5, ground_truth_store=store)
approved, score = scorer.review(query, response)
if score.warning:
    logger.warning("Low coherence: %s", score.score)

Strict mode (production)¶

from director_ai import CoherenceScorer

scorer = CoherenceScorer(
    threshold=0.6,
    strict_mode=True,
    use_nli=True,
)

Agent with fallback (user-facing)¶

from director_ai import CoherenceAgent

agent = CoherenceAgent(fallback="retrieval")
result = agent.process("What is the refund policy?")
if result.halted:
    print("Fell back to KB retrieval")