Metrics & Observability¶

Director-AI ships a zero-dependency Prometheus-compatible metrics collector. All metrics use the director_ai_ prefix.

Metric Reference¶

Counters¶

Metric	Labels	Description
`reviews_total`	—	Total review requests processed
`reviews_approved`	—	Reviews that passed coherence threshold
`reviews_rejected`	—	Reviews that failed coherence threshold
`halts_total`	`reason`	Safety kernel halt events
`feedback_total`	`outcome`	Human feedback events used for calibration and false-positive tracking
`retune_recommendations_total`	none	Retune recommendations emitted by safety operations
`http_requests_total`	`method`, `endpoint`, `status`	HTTP requests by method/endpoint/status

Histograms¶

Metric	Buckets	Description
`coherence_score`	0.1–1.0 (step 0.1)	Coherence score distribution
`review_duration_seconds`	0.01–10s	End-to-end review latency
`batch_size`	1–1000	Batch request sizes
`nli_inference_seconds`	0.005–5s	Single NLI inference latency
`factual_retrieval_seconds`	0.001–1s	RAG retrieval latency
`chunked_nli_seconds`	0.01–30s	Chunked NLI scoring latency
`nli_premise_chunks`	1–20	Premise chunk count per scoring call
`nli_hypothesis_chunks`	1–20	Hypothesis chunk count per scoring call
`http_request_duration_seconds`	0.005–10s	HTTP request duration

HTTP request metrics use route templates for endpoint labels, such as /v1/sessions/{session_id} and /v1/tenants/{tenant_id}/facts, not raw path values. This keeps Prometheus cardinality bounded and prevents tenant, session, or document identifiers from entering metric labels. Authentication failures are also counted with the same route-template label contract.

Gauges¶

Metric	Description
`active_requests`	In-flight requests
`nli_model_loaded`	1 if NLI model is loaded
`kb_stale_sources`	Knowledge sources that need refresh or review
`retune_recommended`	1 when recent feedback indicates retuning is due

Prometheus Endpoint¶

GET /v1/metrics/prometheus

Output includes # HELP and # TYPE headers per metric family, le="+Inf" overflow bucket on histograms, and labeled counter lines.

Kubernetes Scrape Config¶

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/v1/metrics/prometheus"

Grafana PromQL Examples¶

# Request rate (5m window)
rate(director_ai_http_requests_total[5m])

# p99 review latency
histogram_quantile(0.99, rate(director_ai_review_duration_seconds_bucket[5m]))

# Error rate by endpoint
sum(rate(director_ai_http_requests_total{status=~"5.."}[5m]))
  / sum(rate(director_ai_http_requests_total[5m]))

# Coherence score distribution
histogram_quantile(0.5, rate(director_ai_coherence_score_bucket[5m]))

# Average premise chunks per call
rate(director_ai_nli_premise_chunks_sum[5m])
  / rate(director_ai_nli_premise_chunks_count[5m])

Docker Compose Verification¶

# Start the server
docker compose up -d director-ai

# Verify Prometheus output
curl -s http://localhost:8080/v1/metrics/prometheus | head -20
# Expected: lines starting with # HELP, # TYPE, director_ai_*

JSON Metrics¶

GET /v1/metrics

Returns all counters, histograms (count/total/mean/p50/p90/p99), and gauges as JSON.

Python API¶

from director_ai.core.metrics import metrics

metrics.inc("reviews_total")
metrics.inc("halts_total", label="hard_limit")
metrics.inc_labeled("feedback_total", {"outcome": "false_positive"})
metrics.inc("retune_recommendations_total")
metrics.inc_labeled("http_requests_total", {"method": "GET", "status": "200"})
metrics.observe("coherence_score", 0.87)
metrics.gauge_set("nli_model_loaded", 1.0)
metrics.gauge_set("kb_stale_sources", 2)
metrics.gauge_set("retune_recommended", 1)

with metrics.timer("review_duration_seconds"):
    approved, score = scorer.review(query, response)

Sustainability Policy Events¶

The sustainability scoring adapter returns tenant-safe GuardDecision and SafetyEvent payloads rather than raw metrics. Deployments that mirror these events into Prometheus, OpenTelemetry, or a billing warehouse should export only aggregate fields:

decision, reason, and policy id
request-count and total-unit summaries
energy, carbon, and cost estimates with provenance
threshold alert names
hardware profile id

Do not export raw prompts, completions, media, credentials, API keys, or access tokens through sustainability telemetry. Hardware profile values must be marked as measured, configured, or projected so operators can distinguish instrumented energy measurements from planning estimates.