Skip to content

Guardrail Forensics

The KPI layer says whether the guardrail is healthy. The forensics layer explains reviewed misses without exposing raw prompt, response, or evidence text.

build_forensics_report() consumes tenant-safe eval records, usually from eval_trace, joined with reviewer labels:

from director_ai.core.observability import build_forensics_report

records = [
    {
        "director.eval.answer_id": "case-1",
        "director.eval.approved": True,
        "director.eval.score": 0.82,
        "director.eval.threshold": 0.60,
        "director.eval.scorer": "nli",
        "director.eval.model": "customer-model",
        "director.eval.evidence_count": 0,
        "label": "hallucination",
    }
]

report = build_forensics_report(records)

The report classifies each reviewed case as:

Outcome Meaning
false_negative Reviewer labelled a hallucination that the guard allowed.
false_positive Reviewer labelled a grounded answer that the guard halted.
correct_halt Reviewer confirmed a halted hallucination.
correct_allow Reviewer confirmed an allowed grounded answer.
unlabelled_allow / unlabelled_halt Eval record has no reviewer label yet.

For every case it records the scorer, model, model revision when supplied, domain, threshold margin, knowledge-state summary, reason, and recommended operator action. Examples include refresh_or_add_governed_facts, add_counterexample_and_recalibrate_scorer, and review_retrieval_source_mapping.

CLI

director-ai forensics reads either a JSON array of records or an object with a records array:

director-ai forensics --input eval_records.json --format markdown

The json output includes:

  • top-level miss counts;
  • misses grouped by scorer, model, and domain;
  • per-case action recommendations;
  • a privacy block confirming that raw prompt, response, and evidence text are not included.

This is the core file/export surface. The richer safety dashboard remains the UI/operations packet around halt rates, drift alerts, controls, and compliance exports.

API

director_ai.core.observability.forensics.ForensicsCase dataclass

ForensicsCase(case_id: str, outcome: str, approved: bool, expected_label: str, score: float, threshold: float, margin: float, scorer: str, model: str, model_revision: str, domain: str, knowledge_state: str, evidence_count: int, unsupported_claims: int, reason: str, recommended_action: str)

One tenant-safe reviewed guard decision for operator forensics.

to_dict

to_dict() -> dict[str, str | int | float | bool]

Return a JSON-compatible tenant-safe case payload.

director_ai.core.observability.forensics.ForensicsReport dataclass

ForensicsReport(total_records: int, labelled_records: int, misses_total: int, false_negatives: int, false_positives: int, missed_by_scorer: dict[str, int], missed_by_model: dict[str, int], missed_by_domain: dict[str, int], cases: tuple[ForensicsCase, ...])

Tenant-safe scorer-miss report for a reviewed decision window.

to_dict

to_dict() -> dict[str, Any]

Return a JSON-compatible report payload.

director_ai.core.observability.forensics.build_forensics_report

build_forensics_report(records: Sequence[Mapping[str, object]]) -> ForensicsReport

Build a scorer-miss report from tenant-safe eval/reviewer records.

Parameters:

Name Type Description Default
records Sequence[Mapping[str, object]]

Eval-trace records or JSON objects containing at least approval, score, threshold, scorer/model metadata, and optionally a reviewer label. The function accepts both director.eval.* keys and plain aliases such as approved or label so exports can be joined without rewriting.

required

director_ai.core.observability.forensics.render_forensics_markdown

render_forensics_markdown(report: ForensicsReport) -> str

Render a Markdown scorer-miss report for operator review.

director_ai.core.observability.forensics.render_forensics_text

render_forensics_text(report: ForensicsReport) -> str

Render a plain-text scorer-miss report for CLI output.