OpenTelemetry Eval Trace Standard¶
Director's guard decision is most useful when it lands in the evaluation tracer a
team already runs. The eval trace standard emits each guard decision as an
OpenTelemetry span carrying a stable attribute schema — Director's own
director.eval.* fields plus the gen_ai.* semantic-convention keys — so an
OTLP-native eval tracer (Phoenix / Arize) ingests it directly, and a
metadata-based tracer (LangSmith, Ragas) consumes the same attribute dict.
The attributes are tenant-safe and OTel-primitive: no raw prompt, answer, or chunk text, only ids, counts, scores, and labels.
The attribute schema (director.eval.v1)¶
| Attribute | Meaning |
|---|---|
director.eval.schema_version |
schema version (director.eval.v1) |
director.eval.decision |
allow / halt |
director.eval.approved |
guard approval (bool) |
director.eval.score / director.eval.threshold |
coherence score and threshold |
director.eval.evidence_count |
evidence chunks the decision rested on |
director.eval.unsupported_claims |
unsupported claims (from the Answer BOM) |
director.eval.model / director.eval.scorer |
model and scorer ids |
director.eval.tenant_id / director.eval.domain |
tenant and domain |
director.eval.answer_id |
links the span to the Answer BOM |
gen_ai.system / gen_ai.operation.name / gen_ai.request.model |
OTLP gen-AI conventions |
Emitting a decision¶
from director_ai.guard import ProductionGuard
guard = ProductionGuard.from_profile("finance")
result = guard.check(prompt, response)
# Emit an OTel span and get the record back for non-OTLP tracers
record = guard.eval_trace(result, model="gpt-4o", tenant_id="acme", domain="finance")
record is the attribute dict; the span is emitted under the
director_ai.eval.guard_decision name. Configure an exporter once with
setup_otel() (see the OpenTelemetry bridge) and the spans flow to your
collector.
For direct control:
from director_ai.core.eval_trace import (
guard_decision_attributes,
record_guard_decision,
)
attrs = guard_decision_attributes(
decision="halt", approved=False, score=0.4, threshold=0.6, model="gpt-4o"
)
with record_guard_decision(attrs) as span:
... # span carries the attributes; a no-op sink when the SDK is absent
Ingesting downstream¶
- Phoenix / Arize ingest OTLP spans directly — point an OTLP exporter at the
collector and the
gen_ai.*+director.eval.*attributes appear on each span. - LangSmith / Ragas take metadata records — pass the dict returned by
eval_trace/eval_record_from_guardas the run metadata.
The emitter is a no-op when the OpenTelemetry SDK is not installed, so calling
eval_trace is always safe.