Product Overview¶
Director-AI is a real-time factual-coherence guard for LLM applications. It checks generated answers against governed facts, retrieved evidence, and structured verification rules before those answers become user-visible decisions.
The core product question is simple:
Can this model answer be trusted enough to show, stream, store, route, or act on?
Director-AI answers that question with an auditable verdict, a score, and the evidence that drove the decision.
Product Thesis¶
LLM adoption is moving from chat experiments into workflows that affect customers, research, operations, and regulated decisions. The failure mode is not only toxic content; it is confident unsupported content that looks useful until it becomes an incident.
Director-AI creates a control point between generation and consequence:
- Application teams get a small API that can approve, reject, annotate, or halt output.
- Platform teams get shared REST/gRPC infrastructure, auth, rate limits, metrics, and deployment runbooks.
- Evaluation teams get repeatable scoring, batch processing, threshold tuning, and benchmark evidence.
- Governance teams get tenant-safe audit trails, compliance reports, and human-review escalation paths.
The open repository contains the public core and evidence surface. Commercial work can add customer-specific sector packs, deployment recipes, tuning data, and acceptance evidence under a separate agreement.
What It Is For¶
Director-AI is built for teams that already use LLMs in workflows where a wrong claim has operational cost:
| Use case | What Director-AI protects | Primary value |
|---|---|---|
| Customer support | Policy, refund, warranty, and account answers | Fewer unsupported claims reaching customers |
| Regulated research | Scientific, medical, legal, or financial summaries | Evidence-linked rejection of unsupported claims |
| RAG assistants | Answers grounded in a private knowledge base | Traceable retrieval plus coherence scoring |
| Streaming chat | Token streams shown as they are generated | Mid-stream halt before a bad answer completes |
| Agent workflows | Tool outputs, handoffs, and multi-step traces | Per-step checks before downstream action |
| Evaluation pipelines | Batch scoring of prompt/response datasets | Regression gates and labelled feedback loops |
| Enterprise governance | Tenant-safe audit trails and compliance reports | Reviewable evidence for risk, quality, and controls |
It does not replace moderation, access control, data governance, or domain expert review. It adds a factual-coherence control plane that can be combined with those systems.
How It Works¶
Director-AI can run in-process, behind an HTTP proxy, as FastAPI middleware, as a REST/gRPC service, or inside integration adapters.
graph LR
A["Application"] --> B["LLM provider or local model"]
B --> C["Director-AI guard"]
C --> D["Knowledge base"]
C --> E["Scorers and verifiers"]
E --> F{"Approve?"}
F -->|yes| G["Return answer"]
F -->|no| H["Halt, reject, or route to review"]
H --> I["Evidence and audit event"]
The default production pattern has four layers:
- Grounding: bring governed facts through a key-value store, vector store, document ingestion pipeline, or customer runtime package.
- Scoring: combine logical contradiction, retrieval evidence, rules, embeddings, NLI models, and optional structured verification.
- Control: choose what happens on failure: raise, log, attach metadata, halt a stream, route to a review queue, or reject an HTTP response.
- Evidence: emit scores, retrieved facts, halt reasons, audit records, and optional compliance reports.
Application Lanes¶
Pick one lane first. Mixing all lanes in the first week usually creates noisy results because each lane has a different success signal.
Builder Lane¶
Use this when adding protection to an existing app.
- Start with Quickstart.
- Wrap an SDK client with
guard(). - Add facts inline or through KB ingestion.
- Pick a failure mode: raise, metadata, log, reject, or review.
Platform Lane¶
Use this when exposing the guard as shared infrastructure.
- Deploy the REST server or gRPC server.
- Put the proxy in front of compatible clients.
- Add API keys, rate limits, metrics, audit logs, and deployment runbooks.
- Use Runtime Boundaries to decide which optional runtimes belong in production.
Evaluation Lane¶
Use this when proving that a configuration is good enough for a domain.
- Build labelled prompt/response sets.
- Run batch processing and threshold sweeps.
- Use online calibration for human feedback loops.
- Store benchmark evidence before making domain-specific performance claims.
Enterprise Lane¶
Use this when selling, piloting, or operating in a governed organisation.
- Use Enterprise, Compliance Reporting, and Production Checklist.
- Keep customer-specific claims scoped to customer data and acceptance criteria.
- Use the Customer Model Factory public core to package evidence, deployment manifests, rollback data, and runtime configuration.
Market Value¶
The practical value is not that Director-AI is another chatbot wrapper. The value is that it gives organisations a controllable guard layer between model output and business consequence.
Director-AI can reduce:
- unsupported customer-facing claims;
- manual review load for routine factual checks;
- failed RAG evaluations caused by stale or missing evidence;
- risk from streamed hallucinations shown before post-hoc moderation runs;
- integration work required to reuse one guard policy across multiple LLM providers, frameworks, and deployment targets.
It can increase:
- evidence quality in regulated AI workflows;
- confidence that a new model, prompt, or KB version did not regress;
- portability across local, cloud, proxy, REST, and SDK integration modes;
- customer trust by making rejection reasons inspectable.
The repository is the open core and public evidence surface. Commercial deployments can add customer-specific data mappings, tuning packages, deployment recipes, sector playbooks, and acceptance evidence under a separate agreement.
If you need this framed for leadership review, start with Market Value and Positioning. It translates the same technical surface into commercial outcomes, pilot evidence, and budget justification language.
Evidence Boundaries¶
Director-AI documentation is intentionally conservative about performance and market claims:
- public benchmark numbers must point to committed artefacts or published benchmark methodology;
- customer-specific claims require customer-specific data and approval criteria;
- optional GPU, Rust, ONNX, gRPC, Go, Julia, Lean, and WASM paths are additive, not required for the Python quickstart;
- 100% line coverage is not treated as proof of quality by itself. The project prioritises high honest coverage, meaningful module-specific tests, and evidence-bearing integration checks.
Start Here¶
| Reader | First page | Outcome |
|---|---|---|
| Evaluator | Quickstart | Score and guard a response in minutes |
| Buyer | Why Director-AI | Understand the business problem and alternatives |
| Market reviewer | Guardrail Landscape | Compare factuality, safety, streaming, and audit guardrail categories |
| Developer | API Reference | Choose the right API surface |
| RAG engineer | KB Ingestion | Ground responses in private facts |
| Operator | Production Guide | Deploy, monitor, and audit the service |
| Enterprise reviewer | Evaluation Onboarding | Plan a scoped pilot with evidence gates |