Structured Output Verification¶
Verify JSON, tool calls, and code generated by LLMs — catching hallucinations that NLI cannot detect.
Why Structured Verification?¶
NLI scores natural language against natural language. But agentic LLMs produce JSON, function calls, SQL, and code. A JSON field with the wrong type, a function call to a nonexistent API, or a Python import that doesn't exist — NLI cannot catch these. Structured verification uses deterministic checks (schema validation, AST parsing, manifest lookup) that are 100% precise for format violations, zero latency, and require no model.
JSON Verification¶
Validate JSON output against a schema and optionally ground string values against a knowledge base.
from director_ai import verify_json
# Parse check only
result = verify_json('{"status": "shipped", "tracking": "UPS1234"}')
assert result.valid_json is True
# Schema validation
schema = {
"type": "object",
"required": ["status", "tracking"],
"properties": {
"status": {"type": "string"},
"tracking": {"type": "string"},
"count": {"type": "integer"},
},
}
result = verify_json('{"status": "shipped"}', schema=schema)
assert result.schema_valid is False # missing required "tracking"
# Recursive arrays and nested constraints
schema = {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"type": "object",
"required": ["sku", "quantity"],
"properties": {
"sku": {"type": "string"},
"quantity": {"type": "integer"},
},
"additionalProperties": False,
},
},
"status": {"type": "string", "enum": ["approved", "rejected"]},
},
}
result = verify_json(
'{"items": [{"sku": "A-1", "quantity": "two"}], "status": "pending"}',
schema=schema,
)
assert result.schema_valid is False
# Optional Pydantic model validation
from pydantic import BaseModel
class Payload(BaseModel):
status: str
quantity: int
result = verify_json(
'{"status": "approved", "quantity": "two"}',
pydantic_model=Payload,
)
assert result.schema_valid is False
# Value grounding against a knowledge base
from director_ai import CoherenceScorer
scorer = CoherenceScorer(use_nli=True)
result = verify_json(
'{"status": "shipped", "carrier": "FedEx"}',
score_fn=lambda claim: scorer.calculate_logical_divergence(claim, claim),
)
for v in result.field_verdicts:
print(f"{v.path}: {v.verdict} ({v.reason})")
What It Catches¶
| Check | Example | Detection |
|---|---|---|
| Malformed JSON | {"key": value} |
valid_json=False |
| Missing required field | Schema requires name, JSON has only age |
verdict="missing" |
| Wrong type | Schema says integer, value is "thirty" |
verdict="invalid_type" |
| Extra field | additionalProperties: false and unknown key present |
verdict="extra" |
| Bool as integer | {"count": true} where schema says integer |
verdict="invalid_type" |
| Nested array item violation | items[1].quantity is a string where integer is required |
verdict="invalid_type" |
| Enum or const violation | status not in allowed values |
verdict="invalid_value" |
| Pydantic validation failure | Model rejects a field | verdict="invalid_type" or invalid_value |
| Ungrounded value | String value contradicts knowledge base | verdict="invalid_value" |
Result Type¶
@dataclass
class StructuredVerificationResult:
valid_json: bool
schema_valid: bool | None # None if no schema provided
field_verdicts: list[FieldVerdict]
error_count: int
parse_error: str = ""
Tool Call Verification¶
Verify that an agent's function call is legitimate — the function exists, arguments are correct, and the result wasn't fabricated.
from director_ai import verify_tool_call
manifest = {
"get_weather": {
"description": "Get current weather for a city",
"parameters": {"city": {"type": "string"}},
"returns": "Weather data with temperature and conditions",
},
"search_database": {
"description": "Search customer database",
"parameters": {
"query": {"type": "string"},
"limit": {"type": "integer", "required": False},
},
},
}
# Valid call
result = verify_tool_call(
function_name="get_weather",
arguments={"city": "Prague"},
manifest=manifest,
)
assert result.function_exists is True
assert result.arguments_valid is True
# Nonexistent function
result = verify_tool_call(
function_name="get_stock_price",
arguments={"symbol": "AAPL"},
manifest=manifest,
)
assert result.function_exists is False
# Fabrication detection via execution log
execution_log = [
{"function": "get_weather", "arguments": {"city": "Berlin"}, "result": "cloudy 10C"}
]
result = verify_tool_call(
function_name="get_weather",
arguments={"city": "Prague"},
claimed_result="sunny 22C",
manifest=manifest,
execution_log=execution_log,
)
# Agent claims Prague weather but only Berlin was actually called
assert "different arguments" in result.reason
What It Catches¶
| Check | Example | Detection |
|---|---|---|
| Nonexistent function | Agent calls get_stock_price not in manifest |
function_exists=False |
| Wrong argument type | city=123 where string expected |
arguments_valid=False |
| Missing required arg | No query for search_database |
verdict="missing" |
| Fabricated result | No execution log entry for the call | fabrication_suspected=True |
| Mismatched result | Log says "rainy", agent claims "sunny" | fabrication_suspected=True |
Code Verification¶
Verify generated Python or JSON code for syntax errors, unknown imports, and hallucinated APIs.
from director_ai import verify_code
# Valid Python
result = verify_code("import os\nfiles = os.listdir('.')")
assert result.syntax_valid is True
assert result.unknown_imports == []
# Syntax error
result = verify_code("def broken(:\n pass")
assert result.syntax_valid is False
assert "SyntaxError" in result.parse_error
# Unknown import
result = verify_code("import nonexistent_quantum_ml")
assert "nonexistent_quantum_ml" in result.unknown_imports
# Hallucinated API detection
manifest = {"pd": {"read_csv", "DataFrame", "merge", "groupby"}}
result = verify_code(
"import pandas as pd\ndf = pd.read_quantum_csv('data.csv')",
api_manifest=manifest,
)
assert "pd.read_quantum_csv" in result.hallucinated_apis
What It Catches¶
| Check | Example | Detection |
|---|---|---|
| Syntax error | def foo(: |
syntax_valid=False |
| Unknown import | import fake_lib |
In unknown_imports |
| Hallucinated API | pd.read_quantum_csv() |
In hallucinated_apis |
| JSON syntax | {key: value} |
syntax_valid=False (language="json") |
Multimodal Checks¶
Use MultimodalVerifierAdapter when image, audio, or video evidence needs to
enter the same guard-control path as text, JSON, tool, and code verification.
The adapter is opt-in per modality and returns a tenant-safe GuardDecision.
from director_ai.core.guard_control import RiskEnvelope
from director_ai.core.multimodal_guard import (
MultimodalCheckRequest,
MultimodalVerifierAdapter,
)
adapter = MultimodalVerifierAdapter(
image_guard=image_guard,
caption_score_fn=caption_grounder,
metadata_score_fn=metadata_grounder,
enabled_modalities=("image",),
benchmarked_modalities=("image",),
)
result = adapter.check(
MultimodalCheckRequest(
modality="image",
claim_text="The image shows a labelled package.",
media_ref="media://image-42",
image_bytes=image_bytes,
caption_text="Package label is absent.",
metadata={"captured_at": "2026-05-13", "source": "inspection-rig"},
),
risk_envelope=RiskEnvelope(
action_category="multimodal",
reversibility="reversible",
domain="regulated",
calibrated_threshold=0.5,
no_go_threshold=0.85,
),
policy_id="policy.multimodal.regulated",
)
Uncertain evidence is a warning, not an allow decision. Disabled modalities raise errors instead of silently passing. Safety events include media references and verifier scores; they do not include raw media, transcript text, frame data, caption text, metadata values, or claim text. Caption and metadata grounding callbacks are optional score functions that can lower an otherwise consistent image, audio, or video decision when auxiliary evidence contradicts the claim. The resulting audit references identify only the grounding channel and metadata keys.
Formal and Code Verifier Routing¶
Use FormalCodeVerifierAdapter when symbolic claims or generated code need to
enter the shared guard-control path. The adapter delegates formal formulae to
the configured reasoning backend and generated code to verify_code() by
default.
from director_ai.core.formal_verification import (
And,
FormalCodeVerifierAdapter,
Not,
Variable,
)
from director_ai.core.guard_control import RiskEnvelope
adapter = FormalCodeVerifierAdapter(timeout_ms=1000.0)
result = adapter.verify_formula(
formula=And(Variable("approved"), Not(Variable("approved"))),
risk_envelope=RiskEnvelope(
action_category="code",
reversibility="reversible",
domain="regulated",
calibrated_threshold=0.5,
no_go_threshold=0.85,
),
policy_id="policy.formal.regulated",
evidence_ref="formal://claim-1",
)
Named theorem backends can be selected explicitly:
For generated code with a formal contract, use verify_code_contract(). The
adapter checks code structure first and only then routes the contract to the
configured theorem backend.
Verifier failure is not proof of correctness. Missing backends, exceptions, and timeouts become warning decisions. Generated code is checked structurally and against manifests without executing it, and audit payloads carry evidence references rather than raw formulae or source code.
Custom Module Registry¶
result = verify_code(
"import my_company_sdk\nmy_company_sdk.query('x')",
known_modules={"my_company_sdk"},
api_manifest={"my_company_sdk": {"query", "ingest", "delete"}},
)
assert result.unknown_imports == []
assert result.hallucinated_apis == []
Zero Dependencies¶
All three verifiers use Python stdlib only (json, ast, re). No torch, no transformers, no model downloads. Works on the lite install path. Latency is sub-millisecond.