Evidence Packet (one-command demo)¶

The evidence packet is the single artefact a buyer or auditor runs to see the whole guard loop work and to keep a verifiable record of it. It runs the narrow grounding demo and seals the result so it can be checked later without re-running the guard.

director-ai evidence --emit evidence/        # run the demo, write a sealed packet
director-ai verify-evidence evidence/        # re-check integrity + outcomes

What it does¶

director-ai evidence runs the seven-step demo on a ProductionGuard:

Load a small policy knowledge base (DEMO_FACTS).
Ask a policy question.
Score a grounded answer — expected approved.
Score a hallucinated answer — expected blocked.
(Streaming halt is exercised by the streaming kernel demo.)
Emit, per decision, the Answer Bill of Materials and the OpenTelemetry eval record.
Record the decisions in the packet (and, in a server deployment, the audit log).

The packet is written to evidence_packet.json:

{
  "content": {
    "schema_version": "director.evidence_packet.v1",
    "knowledge_base_size": 5,
    "question": "What is the refund window?",
    "grounded": {"approved": true, "score": 0.92, "answer_bom": {...}, "eval_trace": {...}},
    "hallucinated": {"approved": false, "score": 0.2, "answer_bom": {...}, "eval_trace": {...}},
    "checks": {"grounded_approved": true, "hallucinated_blocked": true}
  },
  "integrity": {"algorithm": "sha256", "digest": "…"}
}

Verification¶

director-ai verify-evidence (or verify_evidence_packet) recomputes the SHA-256 digest over the canonical content and confirms the demo expectations — grounded approved, hallucinated blocked. Any edit to the content changes the digest, so tampering is caught:

from director_ai.core.evidence_packet import (
    build_evidence_packet,
    verify_evidence_packet,
)
from director_ai.guard import ProductionGuard

packet = build_evidence_packet(ProductionGuard.from_profile("fast"))
ok, reason = verify_evidence_packet(packet)

Clear grounded-vs-hallucinated separation requires the model-backed scorer from the director-ai[nli] extra; without it both answers score the heuristic fallback and the demo expectations will not be met.