Structured Output Verification¶
Verify JSON, tool calls, and code generated by LLMs — catching hallucinations that NLI cannot detect.
Why Structured Verification?¶
NLI scores natural language against natural language. But agentic LLMs produce JSON, function calls, SQL, and code. A JSON field with the wrong type, a function call to a nonexistent API, or a Python import that doesn't exist — NLI cannot catch these. Structured verification uses deterministic checks (schema validation, AST parsing, manifest lookup) that are 100% precise for format violations, zero latency, and require no model.
JSON Verification¶
Validate JSON output against a schema and optionally ground string values against a knowledge base.
from director_ai import verify_json
# Parse check only
result = verify_json('{"status": "shipped", "tracking": "UPS1234"}')
assert result.valid_json is True
# Schema validation
schema = {
"type": "object",
"required": ["status", "tracking"],
"properties": {
"status": {"type": "string"},
"tracking": {"type": "string"},
"count": {"type": "integer"},
},
}
result = verify_json('{"status": "shipped"}', schema=schema)
assert result.schema_valid is False # missing required "tracking"
# Value grounding against a knowledge base
from director_ai import CoherenceScorer
scorer = CoherenceScorer(use_nli=True)
result = verify_json(
'{"status": "shipped", "carrier": "FedEx"}',
score_fn=lambda claim: scorer.calculate_logical_divergence(claim, claim),
)
for v in result.field_verdicts:
print(f"{v.path}: {v.verdict} ({v.reason})")
What It Catches¶
| Check | Example | Detection |
|---|---|---|
| Malformed JSON | {"key": value} |
valid_json=False |
| Missing required field | Schema requires name, JSON has only age |
verdict="missing" |
| Wrong type | Schema says integer, value is "thirty" |
verdict="invalid_type" |
| Extra field | additionalProperties: false and unknown key present |
verdict="extra" |
| Bool as integer | {"count": true} where schema says integer |
verdict="invalid_type" |
| Ungrounded value | String value contradicts knowledge base | verdict="invalid_value" |
Result Type¶
@dataclass
class StructuredVerificationResult:
valid_json: bool
schema_valid: bool | None # None if no schema provided
field_verdicts: list[FieldVerdict]
error_count: int
parse_error: str = ""
Tool Call Verification¶
Verify that an agent's function call is legitimate — the function exists, arguments are correct, and the result wasn't fabricated.
from director_ai import verify_tool_call
manifest = {
"get_weather": {
"description": "Get current weather for a city",
"parameters": {"city": {"type": "string"}},
"returns": "Weather data with temperature and conditions",
},
"search_database": {
"description": "Search customer database",
"parameters": {
"query": {"type": "string"},
"limit": {"type": "integer", "required": False},
},
},
}
# Valid call
result = verify_tool_call(
function_name="get_weather",
arguments={"city": "Prague"},
manifest=manifest,
)
assert result.function_exists is True
assert result.arguments_valid is True
# Nonexistent function
result = verify_tool_call(
function_name="get_stock_price",
arguments={"symbol": "AAPL"},
manifest=manifest,
)
assert result.function_exists is False
# Fabrication detection via execution log
execution_log = [
{"function": "get_weather", "arguments": {"city": "Berlin"}, "result": "cloudy 10C"}
]
result = verify_tool_call(
function_name="get_weather",
arguments={"city": "Prague"},
claimed_result="sunny 22C",
manifest=manifest,
execution_log=execution_log,
)
# Agent claims Prague weather but only Berlin was actually called
assert "different arguments" in result.reason
What It Catches¶
| Check | Example | Detection |
|---|---|---|
| Nonexistent function | Agent calls get_stock_price not in manifest |
function_exists=False |
| Wrong argument type | city=123 where string expected |
arguments_valid=False |
| Missing required arg | No query for search_database |
verdict="missing" |
| Fabricated result | No execution log entry for the call | fabrication_suspected=True |
| Mismatched result | Log says "rainy", agent claims "sunny" | fabrication_suspected=True |
Code Verification¶
Verify generated Python or JSON code for syntax errors, unknown imports, and hallucinated APIs.
from director_ai import verify_code
# Valid Python
result = verify_code("import os\nfiles = os.listdir('.')")
assert result.syntax_valid is True
assert result.unknown_imports == []
# Syntax error
result = verify_code("def broken(:\n pass")
assert result.syntax_valid is False
assert "SyntaxError" in result.parse_error
# Unknown import
result = verify_code("import nonexistent_quantum_ml")
assert "nonexistent_quantum_ml" in result.unknown_imports
# Hallucinated API detection
manifest = {"pd": {"read_csv", "DataFrame", "merge", "groupby"}}
result = verify_code(
"import pandas as pd\ndf = pd.read_quantum_csv('data.csv')",
api_manifest=manifest,
)
assert "pd.read_quantum_csv" in result.hallucinated_apis
What It Catches¶
| Check | Example | Detection |
|---|---|---|
| Syntax error | def foo(: |
syntax_valid=False |
| Unknown import | import fake_lib |
In unknown_imports |
| Hallucinated API | pd.read_quantum_csv() |
In hallucinated_apis |
| JSON syntax | {key: value} |
syntax_valid=False (language="json") |
Custom Module Registry¶
result = verify_code(
"import my_company_sdk\nmy_company_sdk.query('x')",
known_modules={"my_company_sdk"},
api_manifest={"my_company_sdk": {"query", "ingest", "delete"}},
)
assert result.unknown_imports == []
assert result.hallucinated_apis == []
Zero Dependencies¶
All three verifiers use Python stdlib only (json, ast, re). No torch, no transformers, no model downloads. Works on the lite install path. Latency is sub-millisecond.