Agent Trajectory Preflight¶
When every generation costs tokens, compute, and user attention, catching a likely-to-fail prompt before the model runs saves all three. The trajectory simulator runs a small fan-out of cheap sampled trajectories, scores each, and returns a verdict — proceed, warn, or halt — that the gateway can act on without ever calling the full model.
The flow¶
prompt ──► TrajectorySimulator.preflight
│
├──► Actor.sample(prompt, seed=17) ──┐
├──► Actor.sample(prompt, seed=18) │ N draws
├──► … │ (default 8)
└──► Actor.sample(prompt, seed=24) ──┘
│
└──► CoherenceScorer.review(prompt, draw)
│
└──► halt_rate + CI + action
The actor is any object with a sample(prompt, seed) -> list[str]
method. In production it wraps a small distilled LLM; in tests and
smoke runs it can be a deterministic mock. The scorer is any
object with the standard review(prompt, action) interface — the
shipped CoherenceScorer plugs in unchanged.
Minimal reproduction¶
from director_ai.core import CoherenceScorer, GroundTruthStore
from director_ai.core.trajectory import TrajectorySimulator
from director_ai.core.actor import MockGenerator
class MockActor:
"""Wrap the shipped MockGenerator to match the Actor protocol."""
def __init__(self) -> None:
self._gen = MockGenerator()
def sample(self, prompt: str, seed: int) -> list[str]:
# Mock generator is deterministic under a fixed seed. Real
# actors draw from a distilled LLM with ``seed`` controlling
# sampling noise.
candidates = self._gen.generate_candidates(prompt, n=1)
return candidates[0]["text"].split() if candidates else []
store = GroundTruthStore()
store.add("capital", "Paris is the capital of France.")
scorer = CoherenceScorer(threshold=0.6, ground_truth_store=store)
simulator = TrajectorySimulator(
actor=MockActor(),
scorer=scorer,
n_simulations=8,
halt_rate_warn=0.25,
halt_rate_halt=0.50,
)
verdict = simulator.preflight("What is the capital of France?")
print(verdict.recommended) # "proceed" / "warn" / "halt"
print(verdict.safety_event.policy_decision)
print(f"halt_rate={verdict.halt_rate:.2f} mean_coh={verdict.mean_coherence:.3f}")
for t in verdict.trajectories:
print(f" traj {t.trajectory_id}: coh={t.final_coherence:.3f} ok={t.approved}")
Action bands¶
| Recommended action | Halt rate | What the gateway does |
|---|---|---|
proceed |
< 0.25 | Run the real model with the cheaper scorer backend. |
warn |
0.25 – 0.50 | Escalate to the NLI backend; stamp X-Trajectory-Warn. |
halt |
≥ 0.50 | Return 422 before the upstream call. The prompt is likely to produce a hallucination that the streaming kernel would halt anyway. |
The thresholds are configurable per deployment. Medical and
financial domains typically tighten both bounds (warn=0.10,
halt=0.25) so marginal prompts get NLI scoring; creative or
chat domains widen them (warn=0.40, halt=0.70).
PreflightVerdict.safety_event records the same action band as an
allow/warn/halt decision with the trajectory.preflight hook id,
halt-rate threshold, latency, and failed trajectory references.
Observing individual draws¶
def log_trajectory(t):
print(f"seed={t.seed} coh={t.final_coherence:.3f} {t.text!r}")
verdict = simulator.preflight(
"Summarise ANULUM's 2025 performance.",
on_trajectory=log_trajectory,
)
The on_trajectory callback fires once per draw as soon as the
scorer returns, before the aggregate verdict is ready. Exceptions
raised by the callback are caught and logged; a broken observer
cannot abort the preflight. This is the hook to wire into Langfuse
or an OTel tracer — send each trajectory as a sibling span under
the preflight span so the full fan-out is visible post hoc.
Seeded determinism for forensics¶
Two preflight calls with the same prompt produce byte-identical verdicts:
a = simulator.preflight("Tell me about France.")
b = simulator.preflight("Tell me about France.")
assert [t.tokens for t in a.trajectories] == [t.tokens for t in b.trajectories]
The per-trajectory seed is base_seed + i; operators reconstruct
any historical preflight decision by replaying the same
base_seed and n_simulations. This is the single most useful
artefact for incident review — a week later you can show exactly
which draws the simulator saw and which one(s) triggered the
halt.
Aggregate metrics¶
verdict = simulator.preflight(prompt)
verdict.halt_rate # fraction of draws that failed
verdict.mean_coherence # arithmetic mean across draws
verdict.std_coherence # stdev (zero with one draw)
verdict.ci_low # 2.5% empirical quantile
verdict.ci_high # 97.5% empirical quantile
verdict.min_coherence # lowest draw
verdict.max_coherence # highest draw
The CI is a plain empirical band, not a conformal prediction — the simulator is foundation scope. Calibrate a proper conformal threshold against historical traces once the fan-out has been running in production long enough to collect a calibration set.
Predictive pre-halt steering¶
Use PredictivePreHaltSteering when the gateway needs an action before
starting the expensive generation path. The controller consumes the preflight
verdict and a calibrated RiskEnvelope, then returns one of three actions:
| Steering action | Guard decision | Gateway action |
|---|---|---|
proceed |
allow |
Use the current backend. |
escalate |
warn |
Route to a stronger verifier or review lane. |
halt |
halt |
Stop before upstream generation and attach the safety event. |
from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.trajectory import PredictivePreHaltSteering
steering = PredictivePreHaltSteering(min_simulations=8)
decision = steering.evaluate(
verdict,
risk_envelope=RiskEnvelope(
action_category="inference_steering",
reversibility="reversible",
domain="regulated",
calibrated_threshold=0.5,
no_go_threshold=0.9,
),
policy_id="policy.prehalt.regulated",
)
event = decision.to_safety_event(hook_id="prehalt.steering")
The controller halts when empirical halt probability crosses the calibrated threshold. It escalates when the upper confidence bound crosses the threshold or when there are too few simulations for the configured minimum. Audit records include probability, confidence bounds, backend recommendation, and failed trajectory IDs; they do not include prompt text or sampled token text.
The same decision can drive native inference-server hooks. Pass the decision to
InferenceServerHook.steer() before sampling a candidate. Low-risk proceed
decisions preserve logits, escalate decisions add a finite negative logit
bias for the candidate token, and halt decisions apply the normal block path
with an inference_server safety event.
Rollback hooks¶
For high-risk actions, register a rollback handle before the action executes. The manager stores tenant-safe identifiers and evidence references only; the real undo implementation stays in the deployment's protected control plane.
from director_ai.core.trajectory import TrajectoryRollbackManager
rollback = TrajectoryRollbackManager()
handle = rollback.register(
rollback_id="threshold-overlay-rollback-20260604",
action_id="threshold-overlay-deploy",
hook=lambda handle, reason: {"rollback_store": "audit-log"},
evidence_refs=("change:42",),
metadata={"owner": "safety"},
)
outcome = rollback.evaluate_preflight(handle.rollback_id, verdict)
print(outcome.status) # not_required / armed / executed
proceed leaves the hook unused, warn or steering escalation arms it for
operator review, and halt executes the hook exactly once. Later calls return
already_executed, which keeps retrying gateways from issuing duplicate undo
operations. Failure records expose the exception class, not raw backend error
text.
Cost¶
Every preflight call is N extra scoring reviews. The default
n_simulations=8 multiplies scoring cost by 8× — worthwhile for
deployments where a halted stream is more expensive than the
preflight fan-out (enterprise customer support, medical), not for
deployments where latency budget is tight and streaming halts
are cheap (demos, internal tools). Measure first, tune second.
Not in this module¶
- Distilled-actor integration. The protocol accepts any
Actor; a production deployment wires a small seq-to-seq model here. The simulator is model-agnostic. - Conformal calibration. The shipped CI is empirical quantiles; conformal bands need historical data.
- Live undo backends.
TrajectoryRollbackManagerexecutes registered hooks, but deployments still own the actual database, control-plane, or model rollback implementation. - Agent handoff.
HandoffScoreralready covers the inter-agent edge; the trajectory simulator is for pre-execution single-turn prompts. The two compose: the gateway preflights, then the swarm guard catches anything that survives. - Rust fast-path. The simulator loop is a thin orchestration layer; the hot path is the scorer itself, which already has a Rust backend. Parallelising the fan-out across threads is a v2 concern.
See also¶
docs-site/cookbook/streaming-halt-guide.md— what happens when preflight saysproceedand the stream drifts anyway.docs-site/cookbook/multi-agent-handoff-failures.md— catching the same failure mode at the handoff edge instead of at preflight.ROADMAP.md— the trajectory simulator is Tier 1 #1 of the 2026-2030 roadmap; this cookbook covers the foundation shipped on 2026-04-17.