How Director-AI compares¶
Director-AI is unusual: it is a response-level runtime guardrail and a CI eval gate in one tool, with its hallucination accuracy benchmarked on LLM-AggreFact. It also ships an experimental token-level streaming halt that re-scores output during generation — a mechanism we deposited early, but which on our own false-halt benchmark cannot yet separate hallucinated from correct streaming text without a high false-halt rate. It is opt-in and under calibration; treat the rows below for it as experimental, not a production claim.
About this page
Competitor entries are compiled from public vendor materials and third-party reviews (as of 2026-06) and are indicative, not independently benchmarked by us. Director-AI entries are from this repository. Corrections welcome.
What's free vs commercial¶
Director-AI is open core. The table below is what ships in the free Apache-2.0 package vs the commercial BUSL-1.1 advanced tier.
| Capability | Free (Apache-2.0 core) | Advanced (BUSL-1.1) |
|---|---|---|
| Token-level streaming halt (experimental) | 🧪 | |
| 5-tier scoring (rules → embeddings → NLI) | ✅ | |
| RAG grounding + vector store | ✅ | |
| Prompt-injection detection (regex + NLI) | ✅ | |
| PII + toxicity moderation | ✅ | |
| Unified firewall decision | ✅ | |
| Rate limiting, multi-tenant isolation | ✅ | |
| Tamper-evident audit chain + evidence packets | ✅ | |
| CI quality gate + GitHub Action | ✅ | |
| REST / gRPC server, Rust acceleration | ✅ | |
| Reasoning-chain + structured-output verification | ✅ | |
| Streaming repair (corrective halt) | ✅ | |
| Multimodal guard (image / audio / video) | ✅ | |
| Temporal-consistency, swarm coherence | ✅ | |
| Voice guardrail, config UI | ✅ | |
| Customer model factory, threat intel | ✅ |
The free core is free for any use, including production and closed-source. The advanced tier is source-available and free to evaluate; production use needs a commercial licence. See Pricing and Licensing.
vs real-time guardrails¶
| Director-AI | Galileo | GA Guard | NeMo Guardrails | Llama Guard 4 | Future AGI | |
|---|---|---|---|---|---|---|
| Token-level streaming halt (experimental) | 🧪 | post-hoc | — | — | — | token-prefix |
| Self-host / open weights | ✅ | — | partial | ✅ | ✅ | hosted |
| Offline / air-gapped | ✅ | — | partial | partial | ✅ | — |
| Injection (semantic NLI) | ✅ | ✅ | ✅ | partial | ✅ | ✅ |
| PII / toxicity | ✅ | ✅ | ✅ | partial | ✅ | ✅ |
| Multimodal | ✅ | ✅ | ✅ | — | partial | ✅ |
| Tamper-evident audit | ✅ | partial | partial | — | — | partial |
| Multi-tenant (OSS tier) | ✅ | partial | partial | — | — | partial |
| Swarm / multi-agent guarding | ✅ | partial | — | — | — | — |
| Cloud SaaS | roadmap | ✅ | ✅ | ✅ | n/a | ✅ |
| Licence | Apache-2.0 + BUSL-1.1 | proprietary | proprietary | Apache-2.0 | open weights | proprietary |
vs eval / observability / red-teaming tools¶
These are mostly evaluation, observability, or testing tools rather than runtime guards. Director-AI spans both — runtime guard and CI eval.
| Director-AI | Braintrust | Patronus | Arize | Promptfoo | Giskard | Guardrails AI | |
|---|---|---|---|---|---|---|---|
| Real-time runtime guard | ✅ | — | partial | — | — | — | ✅ |
| Token-level streaming halt (experimental) | 🧪 | — | — | — | — | — | — |
| CI eval gate | ✅ | ✅ | partial | partial | ✅ | partial | partial |
| Automated red-teaming | ✅ | — | partial | — | ✅ | ✅ | partial |
| Observability / tracing | ✅ | ✅ | partial | ✅ | partial | partial | partial |
| Hallucination / RAG eval | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | partial |
| Self-host / OSS | ✅ | partial | partial | ✅ | ✅ | ✅ | ✅ |
Adversarial-benchmark numbers (HarmBench + JailbreakBench)¶
Measured by benchmarks/jailbreak_detection.py --with-model over the public
HarmBench (400 behaviours),
JailbreakBench (100 harmful +
500 benign incl. an Alpaca sample), and the real published attack artifacts
(PAIR, GCG, DSN, random-search; the prompts an independent tester would use).
The guard measured is LayeredPromptGuard: the pattern InputSanitizer plus a
model stage (ProtectAI deberta-v3-base-prompt-injection-v2, Apache-2.0,
enabled with prompt_guard_model_enabled). A prompt is blocked if either fires.
We report every family separately — including the ones we are weak on — so the numbers reproduce under an independent re-run rather than flattering the product.
| Attack family | What it is | Detection |
|---|---|---|
| Canonical templates | prefix / refusal-suppression / DAN / AIM / base64 | 100.0% (2500/2500) |
| Real artifacts (aggregate) | published PAIR/GCG/DSN/random-search prompts | 74.9% (286/382) |
| └ random-search | optimised black-box | 100% |
| └ DSN | 89% | |
| └ GCG | gradient-optimised suffix | 64% |
| └ PAIR | LLM-crafted persuasion | 40% |
| Held-out evasion (aggregate) | families never used to tune a pattern | 57.2% (1145/2000) |
| └ many-shot / leetspeak | 100% / 87% | |
| └ ROT13 / payload-split | weak spots, disclosed | 31% / 11% |
| Raw harmful goals (baseline) | plain harmful requests — not injections | 0.0% (0/500) |
| Toxicity moderation — raw harmful | detoxify; targets toxic language, not intent | 2.0% |
| False positives — benign | 500 benign (JailbreakBench + Alpaca) | 0.2% (1/500) |
Without the model stage the pattern guard alone scores 0% on every real artifact and held-out family — patterns only catch the vocabulary they were written for. The model stage is what makes the guard hold up against attacks it has not seen, at a 0.2% benign false-positive rate. ROT13 and payload-splitting remain weak and are tracked as open work; we publish them rather than rounding the aggregate up.
This stage is optional, off by default, and still being improved. The default classifier is chosen for a near-zero benign false-positive rate: other public models reach higher recall only by flagging 17-58% of legitimate traffic, which is unusable. A higher-recall, low-FPR option (Meta Prompt Guard 2) is gated and non-permissive but configurable as an opt-in. See the prompt-injection guard guide for the model bake-off and roadmap.
Where we're honest about the roadmap¶
We publish what we don't have yet, too: a cloud SaaS offering and long-context moderation beyond the 512-token model window are on the roadmap. Everything in the tables above is in the repository today.