EU AI Act Compliance with Director-AI¶
A technical guide for engineering teams preparing high-risk AI systems for EU AI Act enforcement (August 2, 2026).
Scope¶
The EU AI Act (Regulation 2024/1689) classifies AI systems by risk tier. High-risk systems — those used in employment, creditworthiness, education, critical infrastructure, law enforcement, and healthcare — face mandatory requirements under Articles 9–15. General-purpose AI (GPAI) models have separate obligations under Articles 51–56.
This guide covers what Director-AI provides out of the box, what it helps with, and what it does not address. Director-AI is a hallucination guardrail, not a full compliance platform. It covers a specific and critical slice: factual accuracy measurement, continuous monitoring, and audit documentation.
Which Articles Director-AI Addresses¶
Article 9 — Risk Management System¶
High-risk AI systems must implement a risk management system that identifies, evaluates, and mitigates foreseeable risks.
What Director-AI provides:
- Measurable risk metric. Every LLM response gets a coherence score (0.0–1.0) combining NLI contradiction detection (H_logical) and RAG factual divergence (H_factual). This is a quantified risk signal, not a qualitative assessment.
- Threshold-based mitigation. Responses below the configured threshold are blocked automatically. The threshold is configurable per domain (
DirectorConfig.from_profile("medical")uses 0.3 by default). - Streaming halt.
StreamingKernelstops token generation mid-stream when coherence drops, preventing hallucinated content from reaching users.
What Director-AI does not provide:
- Identification of all foreseeable risks (toxicity, bias, privacy). Director-AI only measures factual coherence.
- Risk management documentation templates. You still need to write your risk management plan; Director-AI supplies the accuracy evidence that goes into it.
Article 10 — Data and Data Governance¶
Training data quality requirements apply to the AI system's training data, not to the guardrail's knowledge base. However:
What Director-AI provides:
- Fine-tuning data validation.
validate_finetune_data()anddirector-ai validate-data <file.jsonl>check domain-tuning JSONL for duplicates, class imbalance, parse errors, and missing fields before fine-tuning. - Threshold-selection support.
DatasetTypeClassifierhelps choose dataset-specific thresholds for scoring, but it is not a data-governance validator.
Article 11 — Technical Documentation¶
Systems must be accompanied by technical documentation including performance metrics and known limitations.
What Director-AI provides:
- Automated performance reports.
ComplianceReporter.generate_report()produces structured reports with accuracy metrics, confidence intervals, per-model breakdowns, and drift analysis. - Per-claim evidence.
VerifiedScorerproduces per-claim verdicts (supported/contradicted/fabricated/unverifiable) with matched source sentences and traceability scores. - Markdown and structured export. Reports export to Markdown via
report.to_markdown()for inclusion in technical documentation.
Article 12 — Record-Keeping¶
High-risk systems must automatically log events during operation.
What Director-AI provides:
- Audit log.
AuditLogrecords every scored interaction: prompt, response, model, provider, score, verdict confidence, latency, timestamp, task type, domain, and tenant. - Immutable records. Audit entries are append-only. No deletion API.
- Query and export.
AuditLog.query()supports time-range, model, domain, tenant, and limit filters.
from director_ai import AuditLog, ComplianceReporter
log = AuditLog("production_audit.db")
reporter = ComplianceReporter(log)
# Generate report for the last 30 days (default window)
report = reporter.generate_report()
print(report.to_markdown())
Article 13 — Transparency¶
Users must be informed about the AI system's capabilities and limitations.
What Director-AI provides:
- Score transparency. Every response includes a coherence score. Users (or the application layer) can see exactly how confident the system is.
- Evidence on rejection. When a response is blocked, the specific KB chunks that contradicted it are returned. No black-box verdicts.
- Claim-level detail.
VerifiedScorerbreaks responses into individual claims and labels each with a verdict and source match.
Article 14 — Human Oversight¶
High-risk systems must allow human oversight and intervention.
What Director-AI provides:
- Review artifacts for operators. Scores, evidence, and compliance reports give humans concrete material to inspect before adjusting thresholds or approving exceptions.
- Override tracking hook.
AuditLogcan store ahuman_overridevalue when your application records a reviewer decision. - Configurable thresholds. Operators control the coherence threshold, trading off between false positives and missed hallucinations. This is a human decision, not an automated one.
- What Director-AI does not provide here. The package does not ship a reviewer inbox or approval queue;
ReviewQueueis a continuous batching helper, not a human-review workflow.
Article 15 — Accuracy, Robustness, Cybersecurity¶
The article that names accuracy explicitly. Systems must achieve and maintain declared accuracy levels.
What Director-AI provides:
- Declared accuracy with confidence intervals.
ComplianceReportercomputes hallucination rate with Wilson score 95% CI. You declare "hallucination rate < X%" and the system monitors it continuously. - Drift detection.
DriftDetectorcompares current-period metrics against historical baselines. If accuracy degrades beyond a configured threshold, alerts fire. - Feedback loop detection.
FeedbackLoopDetectorcatches self-reinforcing error patterns where the system's own outputs contaminate future scoring. - Regression benchmarks.
director-ai benchruns benchmark and regression suites so you can compare accuracy across releases and configuration changes.
from director_ai.compliance import DriftDetector
detector = DriftDetector(
baseline_hallucination_rate=0.05,
alert_threshold=0.02, # alert if rate increases by >2 percentage points
)
detector.check(current_period_metrics)
What Director-AI Does NOT Cover¶
| EU AI Act Requirement | Status | What You Need |
|---|---|---|
| Bias and fairness (Article 10) | Not covered | Fairness testing tools (Aequitas, Fairlearn) |
| Toxicity and harmful content | Not covered | Content moderation (Llama Guard, NeMo Guardrails) |
| Prompt injection defence | Not covered | Input validation (Rebuff, LLM-Guard) |
| PII and data protection | Not covered | PII detection (Presidio) |
| Conformity assessment | Not covered | Notified body or self-assessment per Annex VI |
| CE marking | Not covered | Legal/regulatory process |
| Post-market surveillance plan | Partially — drift detection feeds into it | Broader monitoring framework |
| Incident reporting (Article 62) | Partially — audit log provides evidence | Reporting process and channels |
Director-AI solves one problem well: factual accuracy of LLM outputs. It generates the accuracy evidence that feeds into your broader compliance documentation.
Implementation Checklist¶
For teams deploying Director-AI as part of EU AI Act compliance:
- Deploy
AuditLogin production — log every scored interaction to the compliance database. - Set domain-appropriate thresholds — use
DirectorConfig.from_profile()or tune withdirector-ai tune <labeled.jsonl>. - Schedule compliance reports — run
ComplianceReporter.generate_report()monthly, or pass explicitsince/untiltimestamps. Include the output in Article 11 technical documentation. - Enable drift detection — configure
DriftDetectorwith your baseline accuracy. Wire alerts to your incident management system. - Implement human review — build an application-side reviewer workflow for low-confidence responses; do not treat
ReviewQueueas a human-review system. - Run regression benchmarks — after every KB update or model change, run
director-ai benchto verify accuracy hasn't degraded. - Document limitations — state clearly that Director-AI covers factual coherence only. Combine with other tools for toxicity, bias, PII.
Timeline¶
| Date | Milestone |
|---|---|
| August 1, 2024 | EU AI Act entered into force |
| February 2, 2025 | Prohibited AI practices take effect |
| August 2, 2025 | GPAI model obligations take effect |
| August 2, 2026 | High-risk AI system obligations take effect |
| August 2, 2027 | Obligations for Annex I high-risk systems |
The high-risk deadline is 4 months away at time of writing. Systems deployed before August 2, 2026 must comply by the deadline. Systems already on the market have until August 2, 2027 if they are substantially modified.
Further Reading¶
- EU AI Act full text (EUR-Lex)
- Compliance Reporting guide — detailed API reference for
ComplianceReporter - Architecture overview — how the scoring pipeline works
- Threshold Tuning — finding the right coherence threshold for your domain
Director-AI provides the accuracy measurement and audit evidence layer. It is one component of a complete EU AI Act compliance programme, not a substitute for legal review or conformity assessment.