Cross-model consensus¶

Status: panel-level agreement scoring. The consensus engine compares several models' answers to one prompt, quantifies how much they agree, and names the claims where they diverge. It is a routing and review aid — it tells you when one model's answer should not be trusted alone — while the streaming halt and NLI scorer remain the response-side ground truth.

Every other guard in Director-AI scores a single response. CrossModelConsensus scores a panel: give it the same question answered by GPT, Claude, Gemini, a local Llama, or any other set of models and it returns how much the answers agree, the specific claims where they conflict (with evidence), and whether to accept the majority answer, send it for review, or escalate to a stronger model or a human.

The two agreement signals¶

Signal	When it is used	What it measures
semantic agreement	an NLI contradiction scorer is supplied	`1 - max(P(contradiction) a→b, P(contradiction) b→a)` for each answer pair — the directional maximum, because either direction is enough to break agreement
lexical agreement	no scorer supplied (or as a cheap pre-filter)	Jaccard word overlap of the two answers via the Rust `rust_word_overlap` kernel, with a bit-exact Python fallback

The panel consensus is the mean pairwise agreement. Below escalate_threshold (default 0.45) the recommendation is escalate; below accept_threshold (default 0.7) it is review; at or above it is accept. The result also carries the full pairwise agreement matrix and a short rationale.

Divergence with evidence¶

When an NLI scorer is supplied, divergences are reported at the claim level: each answer is split into claims and every cross-model claim pair whose contradiction probability clears the flag threshold (the scorer's own threshold by default) is returned as a Divergence naming the two models, the two conflicting claims, and the contradiction strength — so "the models disagree" always comes with the sentence that caused it. Without a scorer the engine still surfaces the single least-agreeing answer pair so the caller sees who diverged most.

Usage¶

from director_ai.guard import ProductionGuard
from director_ai.core.consensus import ModelResponse

guard = ProductionGuard()
panel = [
    ModelResponse("gpt", "The treaty was signed in 1920."),
    ModelResponse("claude", "The treaty was signed in 1919."),
    ModelResponse("gemini", "The treaty was signed in 1920."),
]

# Lexical-only (dependency-free, offline):
result = guard.cross_model_consensus(panel)
print(result.consensus, result.recommendation, result.rationale)

# Semantic, claim-level divergence with evidence:
from director_ai.core.scoring.contradiction import ContradictionScorer

nli = ContradictionScorer.from_pretrained()
result = guard.cross_model_consensus(panel, nli=nli)
for d in result.divergences:
    print(f"{d.model_a} vs {d.model_b}: {d.contradiction:.2f}")
    print(f"  {d.claim_a!r}  ⟂  {d.claim_b!r}")

The engine (and any supplied scorer) persists for the life of the guard.

Polyglot backend¶

The lexical agreement runs through the Rust backfire_kernel.rust_word_overlap extension when it is installed, and a pure-Python Jaccard fallback otherwise. The two are bit-for-bit identical — both case-fold, split on whitespace, and retain punctuation on the token — so the dispatch is purely a speed choice and the engine runs everywhere. See Rust Acceleration.

Measured behaviour¶

On the bundled labelled panels (python -m benchmarks.cross_model_consensus):

Metric	Value
Lexical recommendation accuracy (accept vs escalate panels)	1.00
Rust ↔ Python parity	exact
Throughput (lexical consensus)	~114,000 panels/s

These numbers come from the committed benchmark and benchmarks/results/cross_model_consensus.json; reproduce them with the command above. The benchmark exercises the offline lexical path; the semantic NLI path is validated by the unit tests with a deterministic scorer.