Skip to content

Conformal & Uncertainty-Aware Routing

UncertaintyRouter turns a calibrated hallucination interval into a downstream action. Where RiskRouter routes inputs to a scoring backend, this router acts on the output: it consumes the conformal PredictionInterval over hallucination probability and applies documented risk bounds.

Condition Action
interval upper ≤ allow_upper allow (confidently low-risk)
interval lower ≥ reject_lower reject (confidently high-risk)
uncertain and width ≥ escalate_human_width (or calibration unreliable) escalate_human
uncertain and narrower escalate_model (LLM judge / ensemble)

The router is side-effect free and deterministic; each UncertaintyDecision records the bounds it used, so the routing rationale is auditable. Dispatching the action — to a review queue for escalate_human, to a stronger model for escalate_model — is the caller's job.

Online calibration

ConformalPredictor.add_observation(score, correct_label) folds one human verdict into the calibration set and refreshes the conformal quantile, so the intervals tighten as feedback accumulates. correct_label=True marks a correct (non-hallucinated) response.

from director_ai.core.calibration.conformal import ConformalPredictor
from director_ai.core.routing import UncertaintyRouter

predictor = ConformalPredictor(coverage=0.9, min_samples=30)
for score, correct in feedback_history:
    predictor.add_observation(score, correct_label=correct)

router = UncertaintyRouter(allow_upper=0.2, reject_lower=0.8, escalate_human_width=0.5)
decision = router.route(predictor.predict(coherence_score))
if decision.action == "escalate_human":
    review_queue.submit(...)
elif decision.action == "escalate_model":
    llm_judge.adjudicate(...)

ProductionGuard integration

ProductionGuard wires both together. After enable_calibration(), call enable_uncertainty_routing(); every check() then populates GuardResult.uncertainty_action from the conformal interval. Until calibration is reliable, the action is escalate_human — uncertainty defers to a person.

guard.enable_calibration(alpha=0.1)
guard.enable_uncertainty_routing()
result = guard.check(prompt, response)
result.uncertainty_action  # "allow" | "reject" | "escalate_human" | "escalate_model"

Full API

director_ai.core.routing.uncertainty_router.UncertaintyDecision dataclass

UncertaintyDecision(action: UncertaintyAction, point_estimate: float, lower: float, upper: float, width: float, is_reliable: bool, reason: str)

One uncertainty-routing outcome with the bounds that produced it.

director_ai.core.routing.uncertainty_router.UncertaintyRouter

UncertaintyRouter(*, allow_upper: float = 0.2, reject_lower: float = 0.8, escalate_human_width: float = 0.5)

Map a conformal interval to an allow/reject/escalate action.

Parameters:

Name Type Description Default
allow_upper float

Interval upper bound at or below which the response is allowed. Default 0.2.

0.2
reject_lower float

Interval lower bound at or above which the response is rejected. Must be strictly greater than allow_upper. Default 0.8.

0.8
escalate_human_width float

In the uncertain band, intervals at least this wide go to human review; narrower ones go to a stronger model. Default 0.5.

0.5

route

route(interval: PredictionInterval) -> UncertaintyDecision

Return the routing decision for one conformal interval.

director_ai.core.calibration.conformal.ConformalPredictor

ConformalPredictor(coverage: float = 0.95, min_samples: int = 30)

Split conformal prediction for hallucination probability.

Uses nonconformity scores derived from (guardrail_score, human_label) pairs to construct prediction intervals.

Parameters:

Name Type Description Default
coverage float

Target coverage probability (e.g., 0.95 for 95% intervals).

0.95
min_samples int

Minimum calibration samples for reliable intervals. Below this, intervals are returned but marked unreliable.

30

calibrate

calibrate(scores: list[float], labels: list[bool]) -> None

Calibrate from (score, label) pairs.

Parameters:

Name Type Description Default
scores list[float]

Guardrail coherence scores (higher = more coherent).

required
labels list[bool]

True if the response was actually a hallucination (human-verified).

required

add_observation

add_observation(score: float, correct_label: bool) -> None

Add one human-labelled observation and refresh calibration.

correct_label=True means the checked response was correct, while the conformal label stores whether it was actually a hallucination.

calibrate_from_feedback

calibrate_from_feedback(feedback_store) -> None

Calibrate from a FeedbackStore instance.

Reads all entries where human_label is not None and uses (score, human_label) as calibration data.

predict

predict(score: float) -> PredictionInterval

Predict hallucination probability interval for a new score.

Parameters:

Name Type Description Default
score float

Guardrail coherence score for the new response.

required

Returns:

Type Description
PredictionInterval

Calibrated interval with coverage guarantee.

predict_interval

predict_interval(score: float) -> tuple[float, float]

Return the interval tuple expected by ProductionGuard.

route

route(score: float, policy: ConformalRoutingPolicy | None = None) -> ConformalRoutingDecision

Route a score using calibrated uncertainty bounds.