Structured Output Verification¶

Verify JSON, tool calls, and code generated by LLMs — catching hallucinations that NLI cannot detect.

Why Structured Verification?¶

NLI scores natural language against natural language. But agentic LLMs produce JSON, function calls, SQL, and code. A JSON field with the wrong type, a function call to a nonexistent API, or a Python import that doesn't exist — NLI cannot catch these. Structured verification uses deterministic checks (schema validation, AST parsing, manifest lookup) that are 100% precise for format violations, zero latency, and require no model.

JSON Verification¶

Validate JSON output against a schema and optionally ground string values against a knowledge base.

from director_ai import verify_json

# Parse check only
result = verify_json('{"status": "shipped", "tracking": "UPS1234"}')
assert result.valid_json is True

# Schema validation
schema = {
    "type": "object",
    "required": ["status", "tracking"],
    "properties": {
        "status": {"type": "string"},
        "tracking": {"type": "string"},
        "count": {"type": "integer"},
    },
}
result = verify_json('{"status": "shipped"}', schema=schema)
assert result.schema_valid is False  # missing required "tracking"

# Recursive arrays and nested constraints
schema = {
    "type": "object",
    "properties": {
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["sku", "quantity"],
                "properties": {
                    "sku": {"type": "string"},
                    "quantity": {"type": "integer"},
                },
                "additionalProperties": False,
            },
        },
        "status": {"type": "string", "enum": ["approved", "rejected"]},
    },
}
result = verify_json(
    '{"items": [{"sku": "A-1", "quantity": "two"}], "status": "pending"}',
    schema=schema,
)
assert result.schema_valid is False

# Optional Pydantic model validation
from pydantic import BaseModel

class Payload(BaseModel):
    status: str
    quantity: int

result = verify_json(
    '{"status": "approved", "quantity": "two"}',
    pydantic_model=Payload,
)
assert result.schema_valid is False

# Value grounding against a knowledge base
from director_ai import CoherenceScorer
scorer = CoherenceScorer(use_nli=True)

result = verify_json(
    '{"status": "shipped", "carrier": "FedEx"}',
    score_fn=lambda claim: scorer.calculate_logical_divergence(claim, claim),
)
for v in result.field_verdicts:
    print(f"{v.path}: {v.verdict} ({v.reason})")

What It Catches¶

Check	Example	Detection
Malformed JSON	`{"key": value}`	`valid_json=False`
Missing required field	Schema requires `name`, JSON has only `age`	`verdict="missing"`
Wrong type	Schema says `integer`, value is `"thirty"`	`verdict="invalid_type"`
Extra field	`additionalProperties: false` and unknown key present	`verdict="extra"`
Bool as integer	`{"count": true}` where schema says `integer`	`verdict="invalid_type"`
Nested array item violation	`items[1].quantity` is a string where integer is required	`verdict="invalid_type"`
Enum or const violation	`status` not in allowed values	`verdict="invalid_value"`
Pydantic validation failure	Model rejects a field	`verdict="invalid_type"` or `invalid_value`
Ungrounded value	String value contradicts knowledge base	`verdict="invalid_value"`

Result Type¶

@dataclass
class StructuredVerificationResult:
    valid_json: bool
    schema_valid: bool | None  # None if no schema provided
    field_verdicts: list[FieldVerdict]
    error_count: int
    parse_error: str = ""

Tool Call Verification¶

Verify that an agent's function call is legitimate — the function exists, arguments are correct, and the result wasn't fabricated.

from director_ai import verify_tool_call

manifest = {
    "get_weather": {
        "description": "Get current weather for a city",
        "parameters": {"city": {"type": "string"}},
        "returns": "Weather data with temperature and conditions",
    },
    "search_database": {
        "description": "Search customer database",
        "parameters": {
            "query": {"type": "string"},
            "limit": {"type": "integer", "required": False},
        },
    },
}

# Valid call
result = verify_tool_call(
    function_name="get_weather",
    arguments={"city": "Prague"},
    manifest=manifest,
)
assert result.function_exists is True
assert result.arguments_valid is True

# Nonexistent function
result = verify_tool_call(
    function_name="get_stock_price",
    arguments={"symbol": "AAPL"},
    manifest=manifest,
)
assert result.function_exists is False

# Fabrication detection via execution log
execution_log = [
    {"function": "get_weather", "arguments": {"city": "Berlin"}, "result": "cloudy 10C"}
]
result = verify_tool_call(
    function_name="get_weather",
    arguments={"city": "Prague"},
    claimed_result="sunny 22C",
    manifest=manifest,
    execution_log=execution_log,
)
# Agent claims Prague weather but only Berlin was actually called
assert "different arguments" in result.reason

What It Catches¶

Check	Example	Detection
Nonexistent function	Agent calls `get_stock_price` not in manifest	`function_exists=False`
Wrong argument type	`city=123` where string expected	`arguments_valid=False`
Missing required arg	No `query` for `search_database`	`verdict="missing"`
Fabricated result	No execution log entry for the call	`fabrication_suspected=True`
Mismatched result	Log says "rainy", agent claims "sunny"	`fabrication_suspected=True`

Code Verification¶

Verify generated Python or JSON code for syntax errors, unknown imports, and hallucinated APIs.

from director_ai import verify_code

# Valid Python
result = verify_code("import os\nfiles = os.listdir('.')")
assert result.syntax_valid is True
assert result.unknown_imports == []

# Syntax error
result = verify_code("def broken(:\n  pass")
assert result.syntax_valid is False
assert "SyntaxError" in result.parse_error

# Unknown import
result = verify_code("import nonexistent_quantum_ml")
assert "nonexistent_quantum_ml" in result.unknown_imports

# Hallucinated API detection
manifest = {"pd": {"read_csv", "DataFrame", "merge", "groupby"}}
result = verify_code(
    "import pandas as pd\ndf = pd.read_quantum_csv('data.csv')",
    api_manifest=manifest,
)
assert "pd.read_quantum_csv" in result.hallucinated_apis

What It Catches¶

Check	Example	Detection
Syntax error	`def foo(:`	`syntax_valid=False`
Unknown import	`import fake_lib`	In `unknown_imports`
Hallucinated API	`pd.read_quantum_csv()`	In `hallucinated_apis`
JSON syntax	`{key: value}`	`syntax_valid=False` (language="json")

Multimodal Checks¶

Use MultimodalVerifierAdapter when image, audio, or video evidence needs to enter the same guard-control path as text, JSON, tool, and code verification. The adapter is opt-in per modality and returns a tenant-safe GuardDecision.

from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.multimodal_guard import (
    MultimodalCheckRequest,
    MultimodalVerifierAdapter,
)

adapter = MultimodalVerifierAdapter(
    image_guard=image_guard,
    caption_score_fn=caption_grounder,
    metadata_score_fn=metadata_grounder,
    enabled_modalities=("image",),
    benchmarked_modalities=("image",),
)

result = adapter.check(
    MultimodalCheckRequest(
        modality="image",
        claim_text="The image shows a labelled package.",
        media_ref="media://image-42",
        image_bytes=image_bytes,
        caption_text="Package label is absent.",
        metadata={"captured_at": "2026-05-13", "source": "inspection-rig"},
    ),
    risk_envelope=RiskEnvelope(
        action_category="multimodal",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.85,
    ),
    policy_id="policy.multimodal.regulated",
)

Uncertain evidence is a warning, not an allow decision. Disabled modalities raise errors instead of silently passing. Safety events include media references and verifier scores; they do not include raw media, transcript text, frame data, caption text, metadata values, or claim text. Caption and metadata grounding callbacks are optional score functions that can lower an otherwise consistent image, audio, or video decision when auxiliary evidence contradicts the claim. The resulting audit references identify only the grounding channel and metadata keys.

Formal and Code Verifier Routing¶

Use FormalCodeVerifierAdapter when symbolic claims or generated code need to enter the shared guard-control path. The adapter delegates formal formulae to the configured reasoning backend and generated code to verify_code() by default.

from director_ai.core.formal_verification import (
    And,
    FormalCodeVerifierAdapter,
    Not,
    Variable,
)
from director_ai.core.guard_control import RiskEnvelope

adapter = FormalCodeVerifierAdapter(timeout_ms=1000.0)
result = adapter.verify_formula(
    formula=And(Variable("approved"), Not(Variable("approved"))),
    risk_envelope=RiskEnvelope(
        action_category="code",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.85,
    ),
    policy_id="policy.formal.regulated",
    evidence_ref="formal://claim-1",
)

Named theorem backends can be selected explicitly:

adapter = FormalCodeVerifierAdapter.with_theorem_backend("dpll")

For generated code with a formal contract, use verify_code_contract(). The adapter checks code structure first and only then routes the contract to the configured theorem backend.

Verifier failure is not proof of correctness. Missing backends, exceptions, and timeouts become warning decisions. Generated code is checked structurally and against manifests without executing it, and audit payloads carry evidence references rather than raw formulae or source code.

Custom Module Registry¶

result = verify_code(
    "import my_company_sdk\nmy_company_sdk.query('x')",
    known_modules={"my_company_sdk"},
    api_manifest={"my_company_sdk": {"query", "ingest", "delete"}},
)
assert result.unknown_imports == []
assert result.hallucinated_apis == []

Zero Dependencies¶

All three verifiers use Python stdlib only (json, ast, re). No torch, no transformers, no model downloads. Works on the lite install path. Latency is sub-millisecond.