Skip to content

EU AI Act Compliance Reporting

Automated Article 15 documentation — accuracy metrics, drift detection, and audit trails from production data.

Why Compliance Reporting?

The EU AI Act Article 15 requires high-risk AI systems to document accuracy metrics, maintain audit trails, and demonstrate continuous monitoring. Enforcement begins August 2, 2026. Fines reach up to €35M or 7% of global turnover.

Director-AI generates this documentation automatically from production scoring data. No manual effort. No consultants. Self-hosted, so your data never leaves your infrastructure.

Quick Start

from director_ai import AuditLog, AuditEntry, ComplianceReporter
import time

# 1. Log every scored LLM interaction
log = AuditLog("production_audit.db")

log.log(AuditEntry(
    prompt="What is our refund policy?",
    response="We offer a 30-day refund policy on all products.",
    model="gpt-4o",
    provider="openai",
    score=0.85,
    approved=True,
    verdict_confidence=0.92,
    task_type="qa",
    domain="customer_support",
    latency_ms=18.5,
    timestamp=time.time(),
))

# 2. Generate Article 15 report
reporter = ComplianceReporter(log)
report = reporter.generate_report()

# 3. Export as Markdown
print(report.to_markdown())

What the Report Contains

1. Accuracy Metrics (Article 15(1))

Metric Description
Overall hallucination rate Fraction of responses rejected, with 95% Wilson CI
Average coherence score Mean NLI-based coherence across all interactions
Average verdict confidence Mean guardrail self-confidence
Average scoring latency Time to score each response

2. Human Oversight (Article 14)

Metric Description
Human overrides recorded How often humans disagreed with the guardrail
Human override rate Override fraction — indicates calibration quality

3. Per-Model Breakdown

Each LLM model used gets its own accuracy stats: - Hallucination rate with confidence intervals - Average score and confidence - Latency comparison

4. Drift Detection (Article 15(3))

The reporter splits the time range into weekly windows and compares hallucination rates across periods. If the rate increases by more than the drift threshold (default 5pp), an alert fires.

reporter = ComplianceReporter(
    log,
    drift_window_days=7,
    drift_threshold=0.05,  # 5pp increase triggers alert
)
report = reporter.generate_report(
    since=time.time() - 30 * 86400,  # last 30 days
)

if report.drift_detected:
    print(f"Drift severity: {report.drift_severity:.2%}")
    # Action: retrain, recalibrate, or switch models

5. Incident Summary

Total rejections (potential hallucinations blocked) during the reporting period.

Integration with Gateway

When director-ai runs as a proxy/gateway, every LLM call gets automatically scored and logged. The compliance reporter reads from the same audit database.

# In your gateway setup:
from director_ai import AuditLog, ComplianceReporter

log = AuditLog("/var/lib/director-ai/audit.db")
reporter = ComplianceReporter(log)

# Weekly cron job:
report = reporter.generate_report()
with open(f"/reports/article15_{date}.md", "w") as f:
    f.write(report.to_markdown())

Filtering

Reports can be filtered by model, domain, tenant, and time range:

# Medical domain only, last 7 days
report = reporter.generate_report(
    since=time.time() - 7 * 86400,
    domain="medical",
)

# Specific model comparison
gpt_report = reporter.generate_report(model="gpt-4o")
claude_report = reporter.generate_report(model="claude-4")

Data Types

@dataclass
class AuditEntry:
    prompt: str
    response: str
    model: str
    provider: str
    score: float
    approved: bool
    verdict_confidence: float
    task_type: str
    domain: str
    latency_ms: float
    timestamp: float
    tenant_id: str = ""
    human_override: bool | None = None

@dataclass
class Article15Report:
    total_interactions: int
    overall_hallucination_rate: float  # with CI
    model_metrics: list[ModelMetrics]
    drift_detected: bool
    drift_severity: float
    incident_count: int
    # ... full fields in API reference