OpenAI / Anthropic SDK Guard¶
2-line integration that wraps your existing SDK client with coherence scoring.
sequenceDiagram
participant App as Your Code
participant Guard as guard(client)
participant SDK as OpenAI / Anthropic SDK
participant LLM as LLM API
participant Scorer as CoherenceScorer
App->>Guard: client.chat.completions.create(...)
Guard->>SDK: Forward request
SDK->>LLM: API call
LLM-->>SDK: Response
SDK-->>Guard: Response object
Guard->>Scorer: review(prompt, response.text)
Scorer-->>Guard: (approved, CoherenceScore)
alt approved
Guard-->>App: Original response
else rejected (on_fail="raise")
Guard-->>App: HallucinationError
else rejected (on_fail="log")
Guard-->>App: Response + warning log
else rejected (on_fail="metadata")
Guard-->>App: Response + score in context var
end
OpenAI¶
from director_ai import guard
from openai import OpenAI
client = guard(
OpenAI(),
facts={"refund": "within 30 days", "hours": "9am-5pm EST"},
threshold=0.6,
on_fail="raise", # "raise" | "log" | "metadata"
)
# Works exactly like normal — hallucinations are caught transparently
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the refund policy?"}],
)
Anthropic¶
from director_ai import guard
import anthropic
client = guard(
anthropic.Anthropic(),
facts={"refund": "within 30 days"},
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "What is the refund policy?"}],
)
Failure Modes¶
| Mode | Behavior |
|---|---|
on_fail="raise" |
Raises HallucinationError |
on_fail="log" |
Logs warning, returns response |
on_fail="metadata" |
Stores score in context var, returns response |
Retrieving Scores¶
from director_ai import guard, get_score
client = guard(OpenAI(), facts={...}, on_fail="metadata")
response = client.chat.completions.create(...)
score = get_score()
if score and not score.approved:
print(f"Low coherence: {score.score:.3f}")
Injection Detection¶
Enable output-side prompt injection detection on any guarded client:
client = guard(
OpenAI(),
facts={"refund": "within 30 days"},
injection_detection=True,
injection_threshold=0.7,
on_fail="raise",
)
When injection is detected, the failure mode mirrors hallucination handling:
| Mode | Behaviour |
|---|---|
on_fail="raise" |
Raises InjectionDetectedError |
on_fail="log" |
Logs Injection detected (risk=0.xxx) warning |
on_fail="metadata" |
Stores score in context var (check cs.injection_risk) |
The score() function also supports injection detection:
from director_ai import score
cs = score(prompt, response, injection_detection=True)
print(cs.injection_risk) # 0.0–1.0 or None
Streaming Support¶
Streaming is automatically guarded with periodic coherence checks every 8 tokens: