Safety Operations Dashboard¶
The safety operations dashboard turns tenant-safe SafetyEvent JSONL and
calibration feedback into an immediate halt-rate view.
It is intended for the first response to drift, stale knowledge, and repeated false-positive halts:
- per-tenant event count, halt count, halt rate, false-positive count, and alert state
- top contradiction sources across recent halts
- recent halt evidence with score, reason, source, and suggested operator action
- a ready-to-run retune command for recent labelled feedback
- a one-click retune action in the wizard that turns labelled feedback into a tuned profile overlay with threshold guidance and confidence notes
Launch The UI¶
The same panel is also available inside:
Open the Safety Ops tab, paste SafetyEvent JSONL, paste optional feedback
JSONL, and render the tables. Use Retune from Feedback when the feedback
rows include prompt, response, and a human verdict.
Text Mode¶
Use text mode when running over SSH or in CI:
director-ai safety-dashboard \
--text \
--events safety_events.jsonl \
--feedback recent_feedback.jsonl
The command prints tenant halt rates, top contradiction sources, recent evidence, and the retune command:
Alert Thresholds¶
The defaults are intentionally conservative:
- halt-rate alert:
0.15 - false-positive alert:
0.05
Override them when a deployment has a known review cadence:
director-ai safety-dashboard \
--text \
--events safety_events.jsonl \
--halt-alert-threshold 0.25 \
--false-positive-alert-threshold 0.10
Prometheus And Grafana Mixin¶
The deployable safety-operations mixin lives under deploy/observability/:
safety-ops-prometheus-rules.ymlsafety-ops-grafana-dashboard.json
Load the rules from Prometheus:
Import safety-ops-grafana-dashboard.json into Grafana and select the same
Prometheus data source that scrapes /v1/metrics/prometheus.
The mixin uses these metric families:
| Metric | Purpose |
|---|---|
director_ai_halts_total |
Halt-rate numerator |
director_ai_reviews_total |
Halt-rate denominator |
director_ai_feedback_total{outcome="false_positive"} |
False-positive feedback |
director_ai_kb_stale_sources |
Stale source count |
director_ai_retune_recommended |
Retune recommendation state |
director_ai_retune_recommendations_total |
Retune recommendation count |
The alert rules include safe denominator guards for low-traffic services, so a single halt during startup does not divide by zero.
Input Shape¶
Each event should be one tenant-safe JSON object per line. Native
SafetyEvent.to_dict() records work directly:
{"event_id":"e1","tenant_id":"tenant-a","policy_decision":"halt","halt_reason":"contradiction","observed_score":0.22,"trace_attribution":{"fact_source":"kb://physics"},"tenant_safe_explanation":"Refresh the cited fact."}
Feedback rows can use the calibration format:
{"event_id":"e1","tenant_id":"tenant-a","guardrail_approved":false,"human_approved":true,"source":"kb://physics"}
Rows where the guard rejected an answer but the human accepted it count as false positives for the tenant.
Rows used for one-click retuning also need the reviewed prompt and response:
{"prompt":"What is the refund window?","response":"Refunds are available for 30 days.","guardrail_approved":true,"human_approved":true,"domain":"support"}
The UI requires at least four labelled rows before it emits an overlay. Mixed approved and rejected examples produce stronger threshold guidance; single-class feedback is allowed but marked provisional.