Skip to content

Agent Trajectory Preflight

When every generation costs tokens, compute, and user attention, catching a likely-to-fail prompt before the model runs saves all three. The trajectory simulator runs a small fan-out of cheap sampled trajectories, scores each, and returns a verdict — proceed, warn, or halt — that the gateway can act on without ever calling the full model.

The flow

prompt ──► TrajectorySimulator.preflight
                ├──► Actor.sample(prompt, seed=17)  ──┐
                ├──► Actor.sample(prompt, seed=18)    │  N draws
                ├──► …                                │  (default 8)
                └──► Actor.sample(prompt, seed=24)  ──┘
                         └──► CoherenceScorer.review(prompt, draw)
                                    └──► halt_rate + CI + action

The actor is any object with a sample(prompt, seed) -> list[str] method. In production it wraps a small distilled LLM; in tests and smoke runs it can be a deterministic mock. The scorer is any object with the standard review(prompt, action) interface — the shipped CoherenceScorer plugs in unchanged.

Minimal reproduction

from director_ai.core import CoherenceScorer, GroundTruthStore
from director_ai.core.trajectory import TrajectorySimulator
from director_ai.core.actor import MockGenerator


class MockActor:
    """Wrap the shipped MockGenerator to match the Actor protocol."""

    def __init__(self) -> None:
        self._gen = MockGenerator()

    def sample(self, prompt: str, seed: int) -> list[str]:
        # Mock generator is deterministic under a fixed seed. Real
        # actors draw from a distilled LLM with ``seed`` controlling
        # sampling noise.
        candidates = self._gen.generate_candidates(prompt, n=1)
        return candidates[0]["text"].split() if candidates else []


store = GroundTruthStore()
store.add("capital", "Paris is the capital of France.")
scorer = CoherenceScorer(threshold=0.6, ground_truth_store=store)

simulator = TrajectorySimulator(
    actor=MockActor(),
    scorer=scorer,
    n_simulations=8,
    halt_rate_warn=0.25,
    halt_rate_halt=0.50,
)

verdict = simulator.preflight("What is the capital of France?")
print(verdict.recommended)  # "proceed" / "warn" / "halt"
print(verdict.safety_event.policy_decision)
print(f"halt_rate={verdict.halt_rate:.2f} mean_coh={verdict.mean_coherence:.3f}")
for t in verdict.trajectories:
    print(f"  traj {t.trajectory_id}: coh={t.final_coherence:.3f} ok={t.approved}")

Action bands

Recommended action Halt rate What the gateway does
proceed < 0.25 Run the real model with the cheaper scorer backend.
warn 0.25 – 0.50 Escalate to the NLI backend; stamp X-Trajectory-Warn.
halt ≥ 0.50 Return 422 before the upstream call. The prompt is likely to produce a hallucination that the streaming kernel would halt anyway.

The thresholds are configurable per deployment. Medical and financial domains typically tighten both bounds (warn=0.10, halt=0.25) so marginal prompts get NLI scoring; creative or chat domains widen them (warn=0.40, halt=0.70).

PreflightVerdict.safety_event records the same action band as an allow/warn/halt decision with the trajectory.preflight hook id, halt-rate threshold, latency, and failed trajectory references.

Observing individual draws

def log_trajectory(t):
    print(f"seed={t.seed} coh={t.final_coherence:.3f} {t.text!r}")

verdict = simulator.preflight(
    "Summarise ANULUM's 2025 performance.",
    on_trajectory=log_trajectory,
)

The on_trajectory callback fires once per draw as soon as the scorer returns, before the aggregate verdict is ready. Exceptions raised by the callback are caught and logged; a broken observer cannot abort the preflight. This is the hook to wire into Langfuse or an OTel tracer — send each trajectory as a sibling span under the preflight span so the full fan-out is visible post hoc.

Seeded determinism for forensics

Two preflight calls with the same prompt produce byte-identical verdicts:

a = simulator.preflight("Tell me about France.")
b = simulator.preflight("Tell me about France.")
assert [t.tokens for t in a.trajectories] == [t.tokens for t in b.trajectories]

The per-trajectory seed is base_seed + i; operators reconstruct any historical preflight decision by replaying the same base_seed and n_simulations. This is the single most useful artefact for incident review — a week later you can show exactly which draws the simulator saw and which one(s) triggered the halt.

Aggregate metrics

verdict = simulator.preflight(prompt)

verdict.halt_rate        # fraction of draws that failed
verdict.mean_coherence   # arithmetic mean across draws
verdict.std_coherence    # stdev (zero with one draw)
verdict.ci_low           # 2.5% empirical quantile
verdict.ci_high          # 97.5% empirical quantile
verdict.min_coherence    # lowest draw
verdict.max_coherence    # highest draw

The CI is a plain empirical band, not a conformal prediction — the simulator is foundation scope. Calibrate a proper conformal threshold against historical traces once the fan-out has been running in production long enough to collect a calibration set.

Predictive pre-halt steering

Use PredictivePreHaltSteering when the gateway needs an action before starting the expensive generation path. The controller consumes the preflight verdict and a calibrated RiskEnvelope, then returns one of three actions:

Steering action Guard decision Gateway action
proceed allow Use the current backend.
escalate warn Route to a stronger verifier or review lane.
halt halt Stop before upstream generation and attach the safety event.
from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.trajectory import PredictivePreHaltSteering

steering = PredictivePreHaltSteering(min_simulations=8)
decision = steering.evaluate(
    verdict,
    risk_envelope=RiskEnvelope(
        action_category="inference_steering",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.9,
    ),
    policy_id="policy.prehalt.regulated",
)

event = decision.to_safety_event(hook_id="prehalt.steering")

The controller halts when empirical halt probability crosses the calibrated threshold. It escalates when the upper confidence bound crosses the threshold or when there are too few simulations for the configured minimum. Audit records include probability, confidence bounds, backend recommendation, and failed trajectory IDs; they do not include prompt text or sampled token text.

The same decision can drive native inference-server hooks. Pass the decision to InferenceServerHook.steer() before sampling a candidate. Low-risk proceed decisions preserve logits, escalate decisions add a finite negative logit bias for the candidate token, and halt decisions apply the normal block path with an inference_server safety event.

Rollback hooks

For high-risk actions, register a rollback handle before the action executes. The manager stores tenant-safe identifiers and evidence references only; the real undo implementation stays in the deployment's protected control plane.

from director_ai.core.trajectory import TrajectoryRollbackManager

rollback = TrajectoryRollbackManager()
handle = rollback.register(
    rollback_id="threshold-overlay-rollback-20260604",
    action_id="threshold-overlay-deploy",
    hook=lambda handle, reason: {"rollback_store": "audit-log"},
    evidence_refs=("change:42",),
    metadata={"owner": "safety"},
)

outcome = rollback.evaluate_preflight(handle.rollback_id, verdict)
print(outcome.status)  # not_required / armed / executed

proceed leaves the hook unused, warn or steering escalation arms it for operator review, and halt executes the hook exactly once. Later calls return already_executed, which keeps retrying gateways from issuing duplicate undo operations. Failure records expose the exception class, not raw backend error text.

Cost

Every preflight call is N extra scoring reviews. The default n_simulations=8 multiplies scoring cost by 8× — worthwhile for deployments where a halted stream is more expensive than the preflight fan-out (enterprise customer support, medical), not for deployments where latency budget is tight and streaming halts are cheap (demos, internal tools). Measure first, tune second.

Not in this module

  • Distilled-actor integration. The protocol accepts any Actor; a production deployment wires a small seq-to-seq model here. The simulator is model-agnostic.
  • Conformal calibration. The shipped CI is empirical quantiles; conformal bands need historical data.
  • Live undo backends. TrajectoryRollbackManager executes registered hooks, but deployments still own the actual database, control-plane, or model rollback implementation.
  • Agent handoff. HandoffScorer already covers the inter-agent edge; the trajectory simulator is for pre-execution single-turn prompts. The two compose: the gateway preflights, then the swarm guard catches anything that survives.
  • Rust fast-path. The simulator loop is a thin orchestration layer; the hot path is the scorer itself, which already has a Rust backend. Parallelising the fan-out across threads is a v2 concern.

See also