Medical Domain Cookbook¶
Complete Working Example¶
from director_ai.core import CoherenceScorer, GroundTruthStore
store = GroundTruthStore() # empty — populate with your KB
store.add("aspirin children", "Aspirin should not be given to children under 16 due to Reye's syndrome risk.")
store.add("blood pressure", "Normal blood pressure is below 120/80 mmHg.")
store.add("diabetes diagnosis", "Type 2 diabetes is diagnosed when fasting glucose exceeds 126 mg/dL.")
scorer = CoherenceScorer(threshold=0.30, ground_truth_store=store)
# Correct claim → approved
approved, score = scorer.review("Is aspirin safe for children?",
"Aspirin should not be given to children under 16 due to Reye's syndrome risk.")
print(f"Correct: approved={approved}, score={score.score:.2f}")
# Incorrect claim → rejected
approved, score = scorer.review("Is aspirin safe for children?",
"Aspirin is perfectly safe for children of all ages.")
print(f"Wrong: approved={approved}, score={score.score:.2f}")
if score.evidence:
for chunk in score.evidence.chunks:
print(f" Evidence: {chunk.text}")
Configuration¶
scorer = CoherenceScorer(
threshold=0.30, # Measured on PubMedQA (F1=59.9% at this threshold)
soft_limit=0.35, # Conservative warning zone
use_nli=True,
nli_model="lytang/MiniCheck-DeBERTa-L",
ground_truth_store=store,
)
Knowledge Base Setup¶
store = VectorGroundTruthStore()
store.ingest([
"Aspirin should not be given to children under 16 due to Reye's syndrome risk.",
"Normal blood pressure is below 120/80 mmHg.",
"Type 2 diabetes is diagnosed when fasting glucose exceeds 126 mg/dL.",
])
Safety Pattern¶
from director_ai import CoherenceAgent
agent = CoherenceAgent(
use_nli=True,
fallback="retrieval", # Always fall back to verified sources
)
# Add disclaimer for all medical content
agent.disclaimer_prefix = "[Medical information — verify with a healthcare provider] "
Risk Reduction¶
| Metric | Without Director-AI | With Director-AI (threshold=0.30) |
|---|---|---|
| Hallucinated dosage/contraindication rate | 8–15% (model-dependent) | < 1% with verified KB |
| Clinician review time per AI response | 45 sec (read + verify manually) | 10 sec (review evidence chunk) |
| Unsafe recommendation reach (before catch) | 100% of users | 0% (mid-stream halt) |
A single prevented wrong-dosage event avoids potential malpractice exposure ($250K–$1M+) and patient harm. At 500 medical queries/day, reducing manual review from 45s to 10s saves ~4.8 clinician-hours/day → ~$175K/year at $150/hr.
Key Considerations¶
- Never use
fallback=Nonefor medical — always provide verified sources - Log all rejections with full evidence for clinical review
- Tune threshold on your data — CoherenceScorer scores cluster 0.25–0.35; start at 0.30 and adjust based on your domain's FPR
- Regular KB updates — medical guidelines change frequently