How Director-AI compares¶

Director-AI is unusual: it is a response-level runtime guardrail and a CI eval gate in one tool, with its hallucination accuracy benchmarked on LLM-AggreFact. It also ships an experimental token-level streaming halt that re-scores output during generation — a mechanism we deposited early, but which on our own false-halt benchmark cannot yet separate hallucinated from correct streaming text without a high false-halt rate. It is opt-in and under calibration; treat the rows below for it as experimental, not a production claim.

About this page

Competitor entries are compiled from public vendor materials and third-party reviews (as of 2026-06) and are indicative, not independently benchmarked by us. Director-AI entries are from this repository. Corrections welcome.

What's free vs commercial¶

Director-AI is open core. The table below is what ships in the free Apache-2.0 package vs the commercial BUSL-1.1 advanced tier.

Capability	Free (Apache-2.0 core)	Advanced (BUSL-1.1)
Token-level streaming halt (experimental)	🧪
5-tier scoring (rules → embeddings → NLI)	✅
RAG grounding + vector store	✅
Prompt-injection detection (regex + NLI)	✅
PII + toxicity moderation	✅
Unified firewall decision	✅
Rate limiting, multi-tenant isolation	✅
Tamper-evident audit chain + evidence packets	✅
CI quality gate + GitHub Action	✅
REST / gRPC server, Rust acceleration	✅
Reasoning-chain + structured-output verification	✅
Streaming repair (corrective halt)		✅
Multimodal guard (image / audio / video)		✅
Temporal-consistency, swarm coherence		✅
Voice guardrail, config UI		✅
Customer model factory, threat intel		✅

The free core is free for any use, including production and closed-source. The advanced tier is source-available and free to evaluate; production use needs a commercial licence. See Pricing and Licensing.

vs real-time guardrails¶

	Director-AI	Galileo	GA Guard	NeMo Guardrails	Llama Guard 4	Future AGI
Token-level streaming halt (experimental)	🧪	post-hoc	—	—	—	token-prefix
Self-host / open weights	✅	—	partial	✅	✅	hosted
Offline / air-gapped	✅	—	partial	partial	✅	—
Injection (semantic NLI)	✅	✅	✅	partial	✅	✅
PII / toxicity	✅	✅	✅	partial	✅	✅
Multimodal	✅	✅	✅	—	partial	✅
Tamper-evident audit	✅	partial	partial	—	—	partial
Multi-tenant (OSS tier)	✅	partial	partial	—	—	partial
Swarm / multi-agent guarding	✅	partial	—	—	—	—
Cloud SaaS	roadmap	✅	✅	✅	n/a	✅
Licence	Apache-2.0 + BUSL-1.1	proprietary	proprietary	Apache-2.0	open weights	proprietary

vs eval / observability / red-teaming tools¶

These are mostly evaluation, observability, or testing tools rather than runtime guards. Director-AI spans both — runtime guard and CI eval.

	Director-AI	Braintrust	Patronus	Arize	Promptfoo	Giskard	Guardrails AI
Real-time runtime guard	✅	—	partial	—	—	—	✅
Token-level streaming halt (experimental)	🧪	—	—	—	—	—	—
CI eval gate	✅	✅	partial	partial	✅	partial	partial
Automated red-teaming	✅	—	partial	—	✅	✅	partial
Observability / tracing	✅	✅	partial	✅	partial	partial	partial
Hallucination / RAG eval	✅	✅	✅	✅	✅	✅	partial
Self-host / OSS	✅	partial	partial	✅	✅	✅	✅

Adversarial-benchmark numbers (HarmBench + JailbreakBench)¶

Measured by benchmarks/jailbreak_detection.py --with-model over the public HarmBench (400 behaviours), JailbreakBench (100 harmful + 500 benign incl. an Alpaca sample), and the real published attack artifacts (PAIR, GCG, DSN, random-search; the prompts an independent tester would use). The guard measured is LayeredPromptGuard: the pattern InputSanitizer plus a model stage (ProtectAI deberta-v3-base-prompt-injection-v2, Apache-2.0, enabled with prompt_guard_model_enabled). A prompt is blocked if either fires.

We report every family separately — including the ones we are weak on — so the numbers reproduce under an independent re-run rather than flattering the product.

Attack family	What it is	Detection
Canonical templates	prefix / refusal-suppression / DAN / AIM / base64	100.0% (2500/2500)
Real artifacts (aggregate)	published PAIR/GCG/DSN/random-search prompts	74.9% (286/382)
└ random-search	optimised black-box	100%
└ DSN		89%
└ GCG	gradient-optimised suffix	64%
└ PAIR	LLM-crafted persuasion	40%
Held-out evasion (aggregate)	families never used to tune a pattern	57.2% (1145/2000)
└ many-shot / leetspeak		100% / 87%
└ ROT13 / payload-split	weak spots, disclosed	31% / 11%
Raw harmful goals (baseline)	plain harmful requests — not injections	0.0% (0/500)
Toxicity moderation — raw harmful	detoxify; targets toxic language, not intent	2.0%
False positives — benign	500 benign (JailbreakBench + Alpaca)	0.2% (1/500)

Without the model stage the pattern guard alone scores 0% on every real artifact and held-out family — patterns only catch the vocabulary they were written for. The model stage is what makes the guard hold up against attacks it has not seen, at a 0.2% benign false-positive rate. ROT13 and payload-splitting remain weak and are tracked as open work; we publish them rather than rounding the aggregate up.

This stage is optional, off by default, and still being improved. The default classifier is chosen for a near-zero benign false-positive rate: other public models reach higher recall only by flagging 17-58% of legitimate traffic, which is unusable. A higher-recall, low-FPR option (Meta Prompt Guard 2) is gated and non-permissive but configurable as an opt-in. See the prompt-injection guard guide for the model bake-off and roadmap.

Where we're honest about the roadmap¶

We publish what we don't have yet, too: a cloud SaaS offering and long-context moderation beyond the 512-token model window are on the roadmap. Everything in the tables above is in the repository today.