Zero-Trust Output Handling¶

Treat every model output as untrusted until it has been encoded for the exact context it is about to enter. The same string is harmless as JSON, an XSS vector in HTML, command injection in a shell, and path traversal on a filesystem — the danger is the sink, not the text. This is the OWASP-LLM05 posture: nothing the model produces is rendered, executed, or persisted raw.

Quick start¶

from director_ai import ProductionGuard
from director_ai.core.config import DirectorConfig
from director_ai.core.output_trust import OutputSink

guard = ProductionGuard(DirectorConfig()).output_trust

# Encode the same model output for two different destinations.
html = guard.encode("<script>alert(1)</script>", OutputSink.HTML_TEXT)
print(html.encoded)   # &lt;script&gt;alert(1)&lt;/script&gt;

arg = guard.encode("file; rm -rf /", OutputSink.SHELL_ARGUMENT)
print(arg.encoded)    # 'file; rm -rf /'  (a single, inert shell argument)

encode() returns an EncodedOutput:

Field	Meaning
`sink`	The `OutputSink` the value was encoded for.
`encoded`	The safe rendering.
`modified`	Whether anything had to be neutralised.
`note`	A short tenant-safe description of what was neutralised.

to_dict() is tenant-safe — sink, encoded value, flag, and note only; never the prompt or retrieval context.

Sinks¶

`OutputSink`	Encoding
`HTML_TEXT` / `HTML_ATTRIBUTE`	HTML-escape `& < > " '`.
`SHELL_ARGUMENT`	POSIX-quote into one inert argument.
`SQL_IDENTIFIER`	Validated against `[A-Za-z_][A-Za-z0-9_]*`, never escaped.
`SQL_STRING_LITERAL`	ANSI single-quote doubling (a bound parameter is still preferred).
`FILESYSTEM_PATH`	Normalised to a relative path; absolute paths and `..` traversal are refused.
`JSON_VALUE`	Serialised as a JSON string literal.
`URL_QUERY`	Percent-encoded.
`EMAIL_HEADER`	CR/LF header injection stripped.
`LOG_LINE`	CR/LF and control characters stripped.

Values that cannot be made safe — an invalid SQL identifier, an absolute or traversing path, a NUL byte — raise UnsafeOutputError rather than emit something that only looks safe. Use encode_or_none() to drop such a field instead of handling the exception per call site.

Refusing code execution¶

Generated text is data, not code. assess() flags constructs that must never be handed to exec/eval or an unsandboxed deserialiser:

risk = guard.assess("__import__('os').system('rm -rf /')")
print(risk.safe_to_execute)   # False
print(risk.constructs)        # ('dynamic_import', 'shell_pipe_redirect')

safe_to_execute is True only when no dangerous category matches. Flagged categories: dynamic import, eval/exec, OS commands, pickle/yaml.load deserialisation, reflective globals/getattr, and shell pipe/redirect metacharacters. assess() never executes or transforms the text — it only reports — so it composes with the per-sink encode() step.