Skip to content

EU AI Act Compliance Reporting

Automated Article 15 documentation — accuracy metrics, drift detection, and audit trails from production data.

Why Compliance Reporting?

The EU AI Act Article 15 requires high-risk AI systems to document accuracy metrics, maintain audit trails, and demonstrate continuous monitoring. Enforcement begins August 2, 2026. Fines reach up to €35M or 7% of global turnover.

Director-AI generates this documentation automatically from production scoring data. No manual effort. No consultants. Self-hosted, so your data never leaves your infrastructure.

Quick Start

from director_ai import (
    Article15TemplateContext,
    AuditLog,
    AuditEntry,
    ComplianceReporter,
)
import time

# 1. Log every scored LLM interaction
log = AuditLog("production_audit.db")

log.log(AuditEntry(
    prompt="What is our refund policy?",
    response="We offer a 30-day refund policy on all products.",
    model="gpt-4o",
    provider="openai",
    score=0.85,
    approved=True,
    verdict_confidence=0.92,
    task_type="qa",
    domain="customer_support",
    latency_ms=18.5,
    timestamp=time.time(),
))

# 2. Generate Article 15 report
reporter = ComplianceReporter(log)
report = reporter.generate_report()

# 3. Export as Markdown
print(report.to_markdown())

# 4. Produce regulator-facing Article 15 technical documentation
context = Article15TemplateContext(
    system_name="Director-AI customer-support guard",
    intended_purpose="Score generated answers against approved support facts.",
    deployment_context="EU customer-support assistant gateway.",
    risk_management_summary="Low-score answers are blocked and routed to review.",
    data_governance_summary="Audit rows are tenant-scoped and PII redaction is enabled.",
    robustness_summary="NLI scoring, streaming halt, drift checks, and red-team tests run.",
    cybersecurity_summary="API-key tenant binding, rate limits, and signed KB entries are enabled.",
    human_oversight_summary="Reviewers can approve, reject, or request regeneration.",
    post_market_monitoring_summary="Operations reviews drift, incidents, and overrides weekly.",
    known_limitations=("Does not replace human approval for regulated advice.",),
    residual_risks=("Knowledge-base facts can become stale between reviews.",),
    evidence_refs=("docs/PRODUCTION_CHECKLIST.md#compliance", "SECURITY.md"),
)
print(report.to_article15_markdown(context))

SOC 2 / ISO 27001 Readiness

build_soc2_iso_readiness_report() generates a tenant-safe readiness crosswalk for customer security reviews. It maps Director-AI evidence references to SOC 2 Trust Services Criteria categories and ISO/IEC 27001:2022 Annex A-style control references, then produces JSON, Markdown, and Trust Console control rows. This is readiness evidence only; it is not a SOC 2 report, ISO/IEC 27001 certification, or auditor opinion.

from director_ai.compliance import (
    ReadinessStatus,
    Soc2IsoControl,
    build_soc2_iso_readiness_report,
)

report = build_soc2_iso_readiness_report(
    controls=[
        Soc2IsoControl(
            control_id="SEC-01",
            title="Tenant authentication and access isolation",
            soc2_criteria=("security", "confidentiality"),
            iso27001_refs=("A.5.15", "A.8.3"),
            status=ReadinessStatus.PASS,
            evidence_refs=("tests/test_server_auth.py", "tests/test_enterprise.py"),
            owner="security",
            updated_at="2026-05-17",
        ),
    ],
)

payload = report.to_dict()
markdown = report.to_markdown()
trust_controls = report.to_trust_controls()

assert payload["privacy"] == {
    "payload_classification": "tenant_safe",
    "raw_security_evidence_included": False,
    "certification_claimed": False,
}

The default catalogue covers tenant isolation, PII redaction, monitoring, incident review, vulnerability evidence, and change management. Controls can be overridden per deployment so operators can add auditor-owned evidence references without serialising raw evidence or customer payloads.

What the Report Contains

1. Accuracy Metrics (Article 15(1))

Metric Description
Overall hallucination rate Fraction of responses rejected, with 95% Wilson CI
Average coherence score Mean NLI-based coherence across all interactions
Average verdict confidence Mean guardrail self-confidence
Average scoring latency Time to score each response

2. Human Oversight (Article 14)

Metric Description
Human overrides recorded How often humans disagreed with the guardrail
Human override rate Override fraction — indicates calibration quality

3. Per-Model Breakdown

Each LLM model used gets its own accuracy stats: - Hallucination rate with confidence intervals - Average score and confidence - Latency comparison

4. Drift Detection (Article 15(3))

The reporter splits the time range into weekly windows and compares hallucination rates across periods. If the rate increases by more than the drift threshold (default 5pp), an alert fires.

reporter = ComplianceReporter(
    log,
    drift_window_days=7,
    drift_threshold=0.05,  # 5pp increase triggers alert
)
report = reporter.generate_report(
    since=time.time() - 30 * 86400,  # last 30 days
)

if report.drift_detected:
    print(f"Drift severity: {report.drift_severity:.2%}")
    # Action: retrain, recalibrate, or switch models

5. Incident Summary

Total rejections (potential hallucinations blocked) during the reporting period.

6. Article 15 Technical Documentation Template

Article15TemplateContext adds the operator-controlled evidence that cannot be derived from metrics alone: intended purpose, deployment context, risk management, data governance, robustness controls, cybersecurity controls, human oversight, post-market monitoring, known limitations, residual risks, and evidence references. Article15Report.to_article15_template(context) returns a tenant-safe dictionary with privacy.raw_interaction_text_included = false. Article15Report.to_article15_markdown(context) renders the same structure as a reviewable technical-documentation draft.

Integration with Gateway

When director-ai runs as a proxy/gateway, every LLM call gets automatically scored and logged. The compliance reporter reads from the same audit database.

# In your gateway setup:
from director_ai import AuditLog, ComplianceReporter

log = AuditLog("/var/lib/director-ai/audit.db")
reporter = ComplianceReporter(log)

# Weekly cron job:
report = reporter.generate_report()
with open(f"/reports/article15_{date}.md", "w") as f:
    f.write(report.to_markdown())

Filtering

Reports can be filtered by model, domain, tenant, and time range:

# Medical domain only, last 7 days
report = reporter.generate_report(
    since=time.time() - 7 * 86400,
    domain="medical",
)

# Specific model comparison
gpt_report = reporter.generate_report(model="gpt-4o")
claude_report = reporter.generate_report(model="claude-4")

Data Types

@dataclass
class AuditEntry:
    prompt: str
    response: str
    model: str
    provider: str
    score: float
    approved: bool
    verdict_confidence: float
    task_type: str
    domain: str
    latency_ms: float
    timestamp: float
    tenant_id: str = ""
    human_override: bool | None = None

@dataclass
class Article15Report:
    total_interactions: int
    overall_hallucination_rate: float  # with CI
    model_metrics: list[ModelMetrics]
    drift_detected: bool
    drift_severity: float
    incident_count: int
    # ... full fields in API reference