Medical Domain Cookbook¶

Complete Working Example¶

from director_ai.core import CoherenceScorer, GroundTruthStore

store = GroundTruthStore()  # empty — populate with your KB
store.add("aspirin children", "Aspirin should not be given to children under 16 due to Reye's syndrome risk.")
store.add("blood pressure", "Normal blood pressure is below 120/80 mmHg.")
store.add("diabetes diagnosis", "Type 2 diabetes is diagnosed when fasting glucose exceeds 126 mg/dL.")

scorer = CoherenceScorer(threshold=0.30, ground_truth_store=store)

# Correct claim → approved
approved, score = scorer.review("Is aspirin safe for children?",
    "Aspirin should not be given to children under 16 due to Reye's syndrome risk.")
print(f"Correct: approved={approved}, score={score.score:.2f}")

# Incorrect claim → rejected
approved, score = scorer.review("Is aspirin safe for children?",
    "Aspirin is perfectly safe for children of all ages.")
print(f"Wrong:   approved={approved}, score={score.score:.2f}")
if score.evidence:
    for chunk in score.evidence.chunks:
        print(f"  Evidence: {chunk.text}")

Configuration¶

scorer = CoherenceScorer(
    threshold=0.30,    # Measured on PubMedQA (F1=59.9% at this threshold)
    soft_limit=0.35,   # Conservative warning zone
    use_nli=True,
    nli_model="lytang/MiniCheck-DeBERTa-L",
    ground_truth_store=store,
)

Knowledge Base Setup¶

store = VectorGroundTruthStore()
store.ingest([
    "Aspirin should not be given to children under 16 due to Reye's syndrome risk.",
    "Normal blood pressure is below 120/80 mmHg.",
    "Type 2 diabetes is diagnosed when fasting glucose exceeds 126 mg/dL.",
])

Safety Pattern¶

from director_ai import CoherenceAgent

agent = CoherenceAgent(
    use_nli=True,
    fallback="retrieval",  # Always fall back to verified sources
)

# Add disclaimer for all medical content
agent.disclaimer_prefix = "[Medical information — verify with a healthcare provider] "

Risk Reduction (Illustrative Estimates)¶

These are illustrative industry estimates, not measured results from Director-AI deployments.

Metric	Without guardrail (industry baseline)	With Director-AI (threshold=0.30)
Hallucinated dosage/contraindication rate	8–15% (model-dependent, per published LLM medical benchmarks)	< 1% estimated with verified KB (not yet validated on clinical data)
Clinician review time per AI response	45 sec (read + verify manually)	10 sec (review evidence chunk) — estimated
Unsafe recommendation reach (before catch)	100% of users	0% (mid-stream halt) — estimated

Cost model (illustrative, not measured): At 500 medical queries/day, reducing manual review from 45s to 10s would save ~4.8 clinician-hours/day (~$175K/year at $150/hr). These figures are planning estimates based on industry rates, not measured deployment data. Validate on your own workload before budgeting.

Key Considerations¶

Never use fallback=None for medical — always provide verified sources
Log all rejections with full evidence for clinical review
Tune threshold on your data — CoherenceScorer scores cluster 0.25–0.35; start at 0.30 and adjust based on your domain's FPR
Regular KB updates — medical guidelines change frequently