Skip to content

Structured Output Verification

Verify JSON, tool calls, and code generated by LLMs — catching hallucinations that NLI cannot detect.

Why Structured Verification?

NLI scores natural language against natural language. But agentic LLMs produce JSON, function calls, SQL, and code. A JSON field with the wrong type, a function call to a nonexistent API, or a Python import that doesn't exist — NLI cannot catch these. Structured verification uses deterministic checks (schema validation, AST parsing, manifest lookup) that are 100% precise for format violations, zero latency, and require no model.

JSON Verification

Validate JSON output against a schema and optionally ground string values against a knowledge base.

from director_ai import verify_json

# Parse check only
result = verify_json('{"status": "shipped", "tracking": "UPS1234"}')
assert result.valid_json is True

# Schema validation
schema = {
    "type": "object",
    "required": ["status", "tracking"],
    "properties": {
        "status": {"type": "string"},
        "tracking": {"type": "string"},
        "count": {"type": "integer"},
    },
}
result = verify_json('{"status": "shipped"}', schema=schema)
assert result.schema_valid is False  # missing required "tracking"

# Recursive arrays and nested constraints
schema = {
    "type": "object",
    "properties": {
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["sku", "quantity"],
                "properties": {
                    "sku": {"type": "string"},
                    "quantity": {"type": "integer"},
                },
                "additionalProperties": False,
            },
        },
        "status": {"type": "string", "enum": ["approved", "rejected"]},
    },
}
result = verify_json(
    '{"items": [{"sku": "A-1", "quantity": "two"}], "status": "pending"}',
    schema=schema,
)
assert result.schema_valid is False

# Optional Pydantic model validation
from pydantic import BaseModel

class Payload(BaseModel):
    status: str
    quantity: int

result = verify_json(
    '{"status": "approved", "quantity": "two"}',
    pydantic_model=Payload,
)
assert result.schema_valid is False

# Value grounding against a knowledge base
from director_ai import CoherenceScorer
scorer = CoherenceScorer(use_nli=True)

result = verify_json(
    '{"status": "shipped", "carrier": "FedEx"}',
    score_fn=lambda claim: scorer.calculate_logical_divergence(claim, claim),
)
for v in result.field_verdicts:
    print(f"{v.path}: {v.verdict} ({v.reason})")

What It Catches

Check Example Detection
Malformed JSON {"key": value} valid_json=False
Missing required field Schema requires name, JSON has only age verdict="missing"
Wrong type Schema says integer, value is "thirty" verdict="invalid_type"
Extra field additionalProperties: false and unknown key present verdict="extra"
Bool as integer {"count": true} where schema says integer verdict="invalid_type"
Nested array item violation items[1].quantity is a string where integer is required verdict="invalid_type"
Enum or const violation status not in allowed values verdict="invalid_value"
Pydantic validation failure Model rejects a field verdict="invalid_type" or invalid_value
Ungrounded value String value contradicts knowledge base verdict="invalid_value"

Result Type

@dataclass
class StructuredVerificationResult:
    valid_json: bool
    schema_valid: bool | None  # None if no schema provided
    field_verdicts: list[FieldVerdict]
    error_count: int
    parse_error: str = ""

Tool Call Verification

Verify that an agent's function call is legitimate — the function exists, arguments are correct, and the result wasn't fabricated.

from director_ai import verify_tool_call

manifest = {
    "get_weather": {
        "description": "Get current weather for a city",
        "parameters": {"city": {"type": "string"}},
        "returns": "Weather data with temperature and conditions",
    },
    "search_database": {
        "description": "Search customer database",
        "parameters": {
            "query": {"type": "string"},
            "limit": {"type": "integer", "required": False},
        },
    },
}

# Valid call
result = verify_tool_call(
    function_name="get_weather",
    arguments={"city": "Prague"},
    manifest=manifest,
)
assert result.function_exists is True
assert result.arguments_valid is True

# Nonexistent function
result = verify_tool_call(
    function_name="get_stock_price",
    arguments={"symbol": "AAPL"},
    manifest=manifest,
)
assert result.function_exists is False

# Fabrication detection via execution log
execution_log = [
    {"function": "get_weather", "arguments": {"city": "Berlin"}, "result": "cloudy 10C"}
]
result = verify_tool_call(
    function_name="get_weather",
    arguments={"city": "Prague"},
    claimed_result="sunny 22C",
    manifest=manifest,
    execution_log=execution_log,
)
# Agent claims Prague weather but only Berlin was actually called
assert "different arguments" in result.reason

What It Catches

Check Example Detection
Nonexistent function Agent calls get_stock_price not in manifest function_exists=False
Wrong argument type city=123 where string expected arguments_valid=False
Missing required arg No query for search_database verdict="missing"
Fabricated result No execution log entry for the call fabrication_suspected=True
Mismatched result Log says "rainy", agent claims "sunny" fabrication_suspected=True

Code Verification

Verify generated Python or JSON code for syntax errors, unknown imports, and hallucinated APIs.

from director_ai import verify_code

# Valid Python
result = verify_code("import os\nfiles = os.listdir('.')")
assert result.syntax_valid is True
assert result.unknown_imports == []

# Syntax error
result = verify_code("def broken(:\n  pass")
assert result.syntax_valid is False
assert "SyntaxError" in result.parse_error

# Unknown import
result = verify_code("import nonexistent_quantum_ml")
assert "nonexistent_quantum_ml" in result.unknown_imports

# Hallucinated API detection
manifest = {"pd": {"read_csv", "DataFrame", "merge", "groupby"}}
result = verify_code(
    "import pandas as pd\ndf = pd.read_quantum_csv('data.csv')",
    api_manifest=manifest,
)
assert "pd.read_quantum_csv" in result.hallucinated_apis

What It Catches

Check Example Detection
Syntax error def foo(: syntax_valid=False
Unknown import import fake_lib In unknown_imports
Hallucinated API pd.read_quantum_csv() In hallucinated_apis
JSON syntax {key: value} syntax_valid=False (language="json")

Multimodal Checks

Use MultimodalVerifierAdapter when image, audio, or video evidence needs to enter the same guard-control path as text, JSON, tool, and code verification. The adapter is opt-in per modality and returns a tenant-safe GuardDecision.

from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.multimodal_guard import (
    MultimodalCheckRequest,
    MultimodalVerifierAdapter,
)

adapter = MultimodalVerifierAdapter(
    image_guard=image_guard,
    caption_score_fn=caption_grounder,
    metadata_score_fn=metadata_grounder,
    enabled_modalities=("image",),
    benchmarked_modalities=("image",),
)

result = adapter.check(
    MultimodalCheckRequest(
        modality="image",
        claim_text="The image shows a labelled package.",
        media_ref="media://image-42",
        image_bytes=image_bytes,
        caption_text="Package label is absent.",
        metadata={"captured_at": "2026-05-13", "source": "inspection-rig"},
    ),
    risk_envelope=RiskEnvelope(
        action_category="multimodal",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.85,
    ),
    policy_id="policy.multimodal.regulated",
)

Uncertain evidence is a warning, not an allow decision. Disabled modalities raise errors instead of silently passing. Safety events include media references and verifier scores; they do not include raw media, transcript text, frame data, caption text, metadata values, or claim text. Caption and metadata grounding callbacks are optional score functions that can lower an otherwise consistent image, audio, or video decision when auxiliary evidence contradicts the claim. The resulting audit references identify only the grounding channel and metadata keys.

Formal and Code Verifier Routing

Use FormalCodeVerifierAdapter when symbolic claims or generated code need to enter the shared guard-control path. The adapter delegates formal formulae to the configured reasoning backend and generated code to verify_code() by default.

from director_ai.core.formal_verification import (
    And,
    FormalCodeVerifierAdapter,
    Not,
    Variable,
)
from director_ai.core.guard_control import RiskEnvelope

adapter = FormalCodeVerifierAdapter(timeout_ms=1000.0)
result = adapter.verify_formula(
    formula=And(Variable("approved"), Not(Variable("approved"))),
    risk_envelope=RiskEnvelope(
        action_category="code",
        reversibility="reversible",
        domain="regulated",
        calibrated_threshold=0.5,
        no_go_threshold=0.85,
    ),
    policy_id="policy.formal.regulated",
    evidence_ref="formal://claim-1",
)

Named theorem backends can be selected explicitly:

adapter = FormalCodeVerifierAdapter.with_theorem_backend("dpll")

For generated code with a formal contract, use verify_code_contract(). The adapter checks code structure first and only then routes the contract to the configured theorem backend.

Verifier failure is not proof of correctness. Missing backends, exceptions, and timeouts become warning decisions. Generated code is checked structurally and against manifests without executing it, and audit payloads carry evidence references rather than raw formulae or source code.

Custom Module Registry

result = verify_code(
    "import my_company_sdk\nmy_company_sdk.query('x')",
    known_modules={"my_company_sdk"},
    api_manifest={"my_company_sdk": {"query", "ingest", "delete"}},
)
assert result.unknown_imports == []
assert result.hallucinated_apis == []

Zero Dependencies

All three verifiers use Python stdlib only (json, ast, re). No torch, no transformers, no model downloads. Works on the lite install path. Latency is sub-millisecond.