Evidence Firewall¶

The evidence firewall screens every retrieved chunk before the model sees it. Grounding only helps if the chunks reaching the model are the ones the tenant is allowed to read, came from a verified write, are still fresh, and carry facts rather than injected instructions. The firewall enforces that as an admission gate in front of retrieval: each chunk runs ten checks and is either admitted into the grounding context or quarantined with a stable, tenant-safe reason code.

It is side-effect-free apart from counter metrics, deterministic for a given clock, and opt-in — retrieval behaves exactly as before unless a deployment enables it.

The ten admission checks¶

Check	Fails when	Reason code
Tenant authorisation	the chunk is owned by a tenant the request may not read	`tenant_mismatch`
Provenance present	no content digest, version, or signature marker exists	`provenance_missing`
Signature verified	the write was not signature-verified at ingest	`signature_unverified`
Content-hash match	a recorded text digest no longer matches the text	`content_hash_mismatch`
Expiry	the recorded `expires_at` has passed	`expired`
Age	the chunk is older than the policy's max age	`too_old`
Source owner	no source owner/key is recorded	`source_owner_unknown`
Sensitivity	the label is not in the allowed set	`sensitivity_blocked`
Allowed use case	the chunk's use-case list excludes the request	`use_case_not_allowed`
Poisoning scan	the text reads as an injected instruction	`poisoning_detected`

A chunk owned by an empty tenant denotes a shared, non-tenant corpus and passes the tenant check. A chunk with no recorded text digest passes the content-hash check — absence of a digest is the provenance check's job, not the integrity check's. A chunk with no allowed-use-case list is unrestricted.

The reason codes are drawn from a closed vocabulary and never contain chunk text, so a quarantine decision can be logged and shipped to a customer audit trail without leaking another tenant's data.

Policy¶

FirewallPolicy selects which checks are enforced and with what bounds. Defaults are fail-closed on the integrity-critical checks (tenant, provenance, signature, content hash, expiry, poisoning) and opt-in on the corpus-shape checks (sensitivity labels, declared use case, source owner) that depend on a customer taxonomy.

from director_ai.core.evidence_firewall import (
    EvidenceFirewall,
    FirewallContext,
    FirewallPolicy,
)

firewall = EvidenceFirewall(FirewallPolicy(max_age_seconds=90 * 86_400))

report = firewall.screen(results, FirewallContext(tenant_id="acme", now_unix=now))

for chunk in report.admitted:        # safe to hand to the model
    ...
for verdict in report.quarantined:   # held back, with reasons
    print(verdict.chunk.chunk_id, verdict.failed_reasons)

FirewallPolicy.permissive() is an explicit, named "firewall disabled" posture for non-tenant development corpora; production should never use it.

The poisoning scan is dependency-free and runs on the hot path. It scores how strongly a chunk's text reads as an instruction aimed at the model — "ignore the previous instructions", "you are now …", a leaked system-prompt fragment, or an embedded tool-call literal — rather than a fact aimed at the user. The model-backed InjectionDetector can be injected in its place:

firewall = EvidenceFirewall(policy, poison_scan=detector.injection_score)

Wiring into retrieval¶

VectorGroundTruthStore takes an optional evidence_firewall. When supplied, every retrieval batch is screened inside the active-results filter, so quarantined chunks never reach retrieve_context or retrieve_context_with_chunks. When omitted, retrieval is unchanged.

from director_ai.core.evidence_firewall import EvidenceFirewall
from director_ai.core.retrieval.vector_store import VectorGroundTruthStore

store = VectorGroundTruthStore(evidence_firewall=EvidenceFirewall())

DirectorConfig builds and attaches the firewall automatically from evidence_firewall_* settings. The firewall is off by default; set evidence_firewall_enabled=True to turn it on:

from director_ai.core.config import DirectorConfig

config = DirectorConfig(
    evidence_firewall_enabled=True,
    evidence_firewall_max_age_seconds=90 * 86_400,
    evidence_firewall_enforce_sensitivity=True,
    evidence_firewall_allowed_sensitivity=("public", "internal"),
)
store = config.build_store()         # firewall attached

Report shape¶

FirewallReport.to_dict() and ChunkVerdict.to_dict() serialise to JSON-safe, tenant-safe dicts — chunk ids, per-check outcomes, and reason codes, never raw text — suitable for an audit record:

{
  "admitted_count": 1,
  "quarantined_count": 1,
  "verdicts": [
    {"chunk_id": "doc1", "admitted": true, "checks": [...], "failed_reasons": []},
    {"chunk_id": "doc2", "admitted": false,
     "checks": [...], "failed_reasons": ["signature_unverified"]}
  ]
}

Metrics¶

evidence_firewall_chunks_screened_total — every chunk the firewall saw.
evidence_firewall_chunks_quarantined_total{reason} — quarantines, labelled by reason code, so a dashboard can show which check is dropping chunks.

Performance¶

Screening is branching plus a native hashlib SHA-256 recompute, so there is no Rust path to add — the digest is already a C path and the rest is control flow. benchmarks/evidence_firewall.py reports per-chunk screening latency and the poison-scan share of it; on the reference workstation a chunk screens in roughly 12 µs, of which the poison scan is about a third.