EU AI Act Compliance Reporting¶
Automated Article 15 documentation — accuracy metrics, drift detection, and audit trails from production data.
Why Compliance Reporting?¶
The EU AI Act Article 15 requires high-risk AI systems to document accuracy metrics, maintain audit trails, and demonstrate continuous monitoring. Enforcement begins August 2, 2026. Fines reach up to €35M or 7% of global turnover.
Director-AI generates this documentation automatically from production scoring data. No manual effort. No consultants. Self-hosted, so your data never leaves your infrastructure.
Quick Start¶
from director_ai import AuditLog, AuditEntry, ComplianceReporter
import time
# 1. Log every scored LLM interaction
log = AuditLog("production_audit.db")
log.log(AuditEntry(
prompt="What is our refund policy?",
response="We offer a 30-day refund policy on all products.",
model="gpt-4o",
provider="openai",
score=0.85,
approved=True,
verdict_confidence=0.92,
task_type="qa",
domain="customer_support",
latency_ms=18.5,
timestamp=time.time(),
))
# 2. Generate Article 15 report
reporter = ComplianceReporter(log)
report = reporter.generate_report()
# 3. Export as Markdown
print(report.to_markdown())
What the Report Contains¶
1. Accuracy Metrics (Article 15(1))¶
| Metric | Description |
|---|---|
| Overall hallucination rate | Fraction of responses rejected, with 95% Wilson CI |
| Average coherence score | Mean NLI-based coherence across all interactions |
| Average verdict confidence | Mean guardrail self-confidence |
| Average scoring latency | Time to score each response |
2. Human Oversight (Article 14)¶
| Metric | Description |
|---|---|
| Human overrides recorded | How often humans disagreed with the guardrail |
| Human override rate | Override fraction — indicates calibration quality |
3. Per-Model Breakdown¶
Each LLM model used gets its own accuracy stats: - Hallucination rate with confidence intervals - Average score and confidence - Latency comparison
4. Drift Detection (Article 15(3))¶
The reporter splits the time range into weekly windows and compares hallucination rates across periods. If the rate increases by more than the drift threshold (default 5pp), an alert fires.
reporter = ComplianceReporter(
log,
drift_window_days=7,
drift_threshold=0.05, # 5pp increase triggers alert
)
report = reporter.generate_report(
since=time.time() - 30 * 86400, # last 30 days
)
if report.drift_detected:
print(f"Drift severity: {report.drift_severity:.2%}")
# Action: retrain, recalibrate, or switch models
5. Incident Summary¶
Total rejections (potential hallucinations blocked) during the reporting period.
Integration with Gateway¶
When director-ai runs as a proxy/gateway, every LLM call gets automatically scored and logged. The compliance reporter reads from the same audit database.
# In your gateway setup:
from director_ai import AuditLog, ComplianceReporter
log = AuditLog("/var/lib/director-ai/audit.db")
reporter = ComplianceReporter(log)
# Weekly cron job:
report = reporter.generate_report()
with open(f"/reports/article15_{date}.md", "w") as f:
f.write(report.to_markdown())
Filtering¶
Reports can be filtered by model, domain, tenant, and time range:
# Medical domain only, last 7 days
report = reporter.generate_report(
since=time.time() - 7 * 86400,
domain="medical",
)
# Specific model comparison
gpt_report = reporter.generate_report(model="gpt-4o")
claude_report = reporter.generate_report(model="claude-4")
Data Types¶
@dataclass
class AuditEntry:
prompt: str
response: str
model: str
provider: str
score: float
approved: bool
verdict_confidence: float
task_type: str
domain: str
latency_ms: float
timestamp: float
tenant_id: str = ""
human_override: bool | None = None
@dataclass
class Article15Report:
total_interactions: int
overall_hallucination_rate: float # with CI
model_metrics: list[ModelMetrics]
drift_detected: bool
drift_severity: float
incident_count: int
# ... full fields in API reference