Agent / MCP Preflight Guard¶
An agent that can call tools, hand off to other agents, and take real-world actions needs a guard at the seams, not only on the final answer. The preflight guard provides one evidence- and policy-tied decision per seam, so a host loop (or an MCP server) can allow, warn, block, or escalate at each step.
Every hook returns a PreflightDecision whose decision is from a closed
vocabulary (allow / warn / block / escalate) and whose reason is a
tenant-safe code, so the host can act on it and log it without leaking tool
arguments or answer text.
The five gates¶
| Hook | Blocks when |
|---|---|
before_tool_call |
policy denies it, evidence is missing, the tool is unknown to the manifest, or the arguments are invalid |
after_tool_result |
the claimed result is fabricated (vs the execution log) or implausible (vs the tool's declared return) |
before_final_answer |
the answer has no supporting evidence, or a canary tripped for it |
before_handoff |
the target agent is not permitted, or the payload is unsafe |
before_irreversible_action |
the action is irreversible and has no safeguard (no registered rollback, no human acknowledgement) → escalated |
from director_ai.core.agent_preflight import AgentPreflightGuard, PreflightPolicy
guard = AgentPreflightGuard(
PreflightPolicy(allowed_handoff_targets=frozenset({"billing", "support"}))
)
# Before a tool call
d = guard.before_tool_call("pay_invoice", {"invoice_id": "INV-1"}, manifest=manifest)
if d.blocked:
refuse(d.reason)
# After the tool returns
d = guard.after_tool_result(
"pay_invoice", {"invoice_id": "INV-1"}, claimed_result, execution_log=log
)
# Before the final answer
d = guard.before_final_answer(evidence_ok=bool(evidence), canary_tripped=tripped)
# Before handing off to another agent
d = guard.before_handoff("billing")
# Before an irreversible action
d = guard.before_irreversible_action(
"permanently delete the account", rollback_registered=have_rollback
)
Irreversible actions¶
before_irreversible_action scores the action's reversibility (via the
RuleReversibility estimator by default; any ReversibilityEstimator can
be injected). A reversible action is allowed. An irreversible one is allowed only
with a safeguard:
- a registered rollback — pair with the trajectory rollback manager: register
a compensating action, then pass
rollback_registered=Trueso the gate warns (armed) rather than escalating; or - a human acknowledgement —
human_acknowledged=True.
With neither, the action is escalated for a human to decide. The decision's metadata carries the reversibility score that drove it.
Through the guard¶
ProductionGuard.preflight returns a preflight guard whose result-plausibility
check uses the guard's coherence scorer:
from director_ai.guard import ProductionGuard
guard = ProductionGuard.from_profile("finance")
decision = guard.preflight.before_tool_call(
"pay_invoice", {"invoice_id": "INV-1"}, manifest=manifest
)
Metrics¶
agent_preflight_decisions_total{hook, decision}— every decision, labelled by hook point and outcome.