Adaptive Threshold Learning¶
director_ai.core.calibration.adaptive_threshold.AdaptiveThresholdLearner
¶
AdaptiveThresholdLearner(*, candidate_thresholds: list[float] | tuple[float, ...], current_threshold: float, min_samples: int = 20, min_expected_lift: float = 0.01, max_false_positive_rate: float = 1.0, max_false_negative_rate: float = 1.0, alpha_prior: float = 1.0, beta_prior: float = 1.0, random_seed: int | None = None)
Offline Thompson-sampling threshold recommender.
The learner is intentionally side-effect free with respect to runtime scorer configuration. Production deployments should route the returned recommendation through human review/change-management and keep the rollback threshold recorded with the approved overlay.
arm
¶
Return the candidate arm for a validated threshold.
observe
¶
Replay one labelled score across all candidate thresholds.
observe_batch
¶
observe_batch(feedback: list[ThresholdFeedback] | tuple[ThresholdFeedback, ...]) -> AdaptiveThresholdReport
Replay a batch of labelled feedback across all candidate thresholds.
report
¶
Return the current replay summary without making a recommendation.
recommend
¶
Return a human-gated threshold recommendation from replayed evidence.
director_ai.core.calibration.adaptive_threshold.ThresholdFeedback
dataclass
¶
ThresholdFeedback(score: float, human_approved: bool, weight: float = 1.0, metadata: dict[str, Any] = dict())
One human-labelled score used for threshold replay.
director_ai.core.calibration.adaptive_threshold.AdaptiveThresholdArm
dataclass
¶
AdaptiveThresholdArm(threshold: float, alpha_prior: float = 1.0, beta_prior: float = 1.0, pulls: int = 0, successes: int = 0, true_positives: int = 0, false_positives: int = 0, true_negatives: int = 0, false_negatives: int = 0)
Posterior state and replay metrics for one candidate threshold.
false_positive_rate
property
¶
Return the false-positive rate for human-rejected samples.
false_negative_rate
property
¶
Return the false-negative rate for human-approved samples.
observe
¶
Replay one labelled score against this threshold arm.
sample_success_probability
¶
Sample a Thompson posterior success probability for this arm.
director_ai.core.calibration.adaptive_threshold.AdaptiveThresholdReport
dataclass
¶
AdaptiveThresholdReport(total_feedback: int, current_threshold: float, best_observed_threshold: float | None, arms: tuple[AdaptiveThresholdArm, ...])
Snapshot after replaying feedback across threshold arms.
director_ai.core.calibration.adaptive_threshold.AdaptiveThresholdRecommendation
dataclass
¶
AdaptiveThresholdRecommendation(current_threshold: float, recommended_threshold: float | None, expected_success_probability: float, current_success_probability: float, expected_lift: float, reason: str, requires_human_approval: bool = True, rollback_threshold: float | None = None, safety_constraints: dict[str, float] = dict())
Human-review-gated threshold recommendation.
to_profile_overlay
¶
Return a profile overlay that can be reviewed before promotion.
Safety Boundary¶
AdaptiveThresholdLearner is an offline recommender. It replays human-labelled
score feedback across fixed candidate thresholds, estimates each candidate with
Beta-Bernoulli posteriors, applies false-positive and false-negative safety
constraints, and returns a recommendation object.
It does not mutate CoherenceScorer, DirectorConfig, profile files, or live
runtime thresholds. Apply the returned profile overlay only after operator
approval and keep the rollback threshold in change-management records.
from director_ai.core import AdaptiveThresholdLearner, ThresholdFeedback
learner = AdaptiveThresholdLearner(
candidate_thresholds=[0.3, 0.4, 0.5, 0.6],
current_threshold=0.4,
max_false_negative_rate=0.05,
)
learner.observe_batch(
[
ThresholdFeedback(score=0.82, human_approved=True),
ThresholdFeedback(score=0.28, human_approved=False),
]
)
recommendation = learner.recommend()
if recommendation.recommended_threshold is not None:
overlay = recommendation.to_profile_overlay(profile="candidate")
For regulated deployments, use this together with HumanReviewQueue,
OnlineCalibrator, and drift reports. Treat candidate thresholds as a controlled
change, not as autonomous production policy.