Cross-model consensus¶
Status: panel-level agreement scoring. The consensus engine compares several models' answers to one prompt, quantifies how much they agree, and names the claims where they diverge. It is a routing and review aid — it tells you when one model's answer should not be trusted alone — while the streaming halt and NLI scorer remain the response-side ground truth.
Every other guard in Director-AI scores a single response. CrossModelConsensus
scores a panel: give it the same question answered by GPT, Claude, Gemini, a
local Llama, or any other set of models and it returns how much the answers agree,
the specific claims where they conflict (with evidence), and whether to accept the
majority answer, send it for review, or escalate to a stronger model or a human.
The two agreement signals¶
| Signal | When it is used | What it measures |
|---|---|---|
| semantic agreement | an NLI contradiction scorer is supplied | 1 - max(P(contradiction) a→b, P(contradiction) b→a) for each answer pair — the directional maximum, because either direction is enough to break agreement |
| lexical agreement | no scorer supplied (or as a cheap pre-filter) | Jaccard word overlap of the two answers via the Rust rust_word_overlap kernel, with a bit-exact Python fallback |
The panel consensus is the mean pairwise agreement. Below escalate_threshold
(default 0.45) the recommendation is escalate; below accept_threshold
(default 0.7) it is review; at or above it is accept. The result also
carries the full pairwise agreement matrix and a short rationale.
Divergence with evidence¶
When an NLI scorer is supplied, divergences are reported at the claim level:
each answer is split into claims and every cross-model claim pair whose
contradiction probability clears the flag threshold (the scorer's own threshold
by default) is returned as a Divergence naming the two models, the two
conflicting claims, and the contradiction strength — so "the models disagree"
always comes with the sentence that caused it. Without a scorer the engine still
surfaces the single least-agreeing answer pair so the caller sees who diverged
most.
Usage¶
from director_ai.guard import ProductionGuard
from director_ai.core.consensus import ModelResponse
guard = ProductionGuard()
panel = [
ModelResponse("gpt", "The treaty was signed in 1920."),
ModelResponse("claude", "The treaty was signed in 1919."),
ModelResponse("gemini", "The treaty was signed in 1920."),
]
# Lexical-only (dependency-free, offline):
result = guard.cross_model_consensus(panel)
print(result.consensus, result.recommendation, result.rationale)
# Semantic, claim-level divergence with evidence:
from director_ai.core.scoring.contradiction import ContradictionScorer
nli = ContradictionScorer.from_pretrained()
result = guard.cross_model_consensus(panel, nli=nli)
for d in result.divergences:
print(f"{d.model_a} vs {d.model_b}: {d.contradiction:.2f}")
print(f" {d.claim_a!r} ⟂ {d.claim_b!r}")
The engine (and any supplied scorer) persists for the life of the guard.
Polyglot backend¶
The lexical agreement runs through the Rust backfire_kernel.rust_word_overlap
extension when it is installed, and a pure-Python Jaccard fallback otherwise. The
two are bit-for-bit identical — both case-fold, split on whitespace, and
retain punctuation on the token — so the dispatch is purely a speed choice and the
engine runs everywhere. See Rust Acceleration.
Measured behaviour¶
On the bundled labelled panels (python -m benchmarks.cross_model_consensus):
| Metric | Value |
|---|---|
| Lexical recommendation accuracy (accept vs escalate panels) | 1.00 |
| Rust ↔ Python parity | exact |
| Throughput (lexical consensus) | ~114,000 panels/s |
These numbers come from the committed benchmark and
benchmarks/results/cross_model_consensus.json; reproduce them with the command
above. The benchmark exercises the offline lexical path; the semantic NLI path is
validated by the unit tests with a deterministic scorer.