Skip to content

REST Server

Production-ready FastAPI server exposing Director-AI scoring over HTTP.

Starting the Server

director-ai serve --port 8080 --workers 4
from director_ai.server import create_app

app = create_app()
# Run with: uvicorn director_ai.server:app --host 0.0.0.0 --port 8080
docker build -t director-ai . && docker run -p 8080:8080 director-ai

Endpoints

Method Path Description
POST /v1/review Score a prompt/response pair
POST /v1/verify Sentence-level multi-signal fact verification
POST /v1/process Full agent pipeline (generate + score)
POST /v1/batch Batch score multiple pairs
GET /v1/health Liveness probe (version, mode, NLI status)
GET /v1/ready Readiness probe — 503 if scorer/NLI not loaded
GET /v1/config Config introspection
GET /v1/metrics Metrics as JSON
GET /v1/metrics/prometheus Prometheus-compatible metrics
GET /v1/source Source code URL (AGPL compliance)
WS /v1/stream WebSocket streaming oversight
POST /v1/knowledge/upload Upload file → parse → chunk → embed
POST /v1/knowledge/ingest Ingest raw text → chunk → embed
GET /v1/knowledge/documents List documents per tenant
DELETE /v1/knowledge/documents/{id} Delete document and chunks
PUT /v1/knowledge/documents/{id} Re-ingest updated content
GET /v1/knowledge/search Test retrieval quality
POST /v1/knowledge/tune-embeddings Fine-tune embeddings on ingested docs
GET /v1/knowledge/documents/{id} Get single document metadata
GET /v1/tenants List tenants (scoped to caller's binding)
POST /v1/tenants/{id}/facts Add keyword fact for tenant
POST /v1/tenants/{id}/vector-facts Add vector fact for tenant
GET/DELETE /v1/sessions/{id} Get or delete a scoring session
GET /v1/stats Aggregate scoring statistics
GET /v1/stats/hourly Hourly scoring breakdown
GET /v1/dashboard Dashboard summary (stats + top tenants)
POST /v1/finetune/start Start domain fine-tuning job
GET /v1/finetune/{job_id} Check local fine-tuning job status
POST /v1/finetune/managed/submit Submit or dry-run managed training
GET /v1/finetune/managed/jobs List managed training submissions for a tenant
POST /v1/finetune/managed/status Refresh managed training backend status
POST /v1/finetune/managed/cancel Cancel a live managed training job
GET /v1/finetune/managed/models List selectable managed training base models
POST /v1/finetune/managed/benchmark-models Anti-regression benchmark for trained artefacts
POST /v1/verify/numeric Numeric consistency verification
POST /v1/verify/reasoning Reasoning chain logic verification
POST /v1/temporal-freshness Temporal freshness / staleness scoring
POST /v1/consensus Cross-model factual agreement
POST /v1/injection/detect Intent-grounded prompt injection detection
POST /v1/adversarial/test Adversarial robustness self-test
POST /v1/conformal/predict Conformal prediction interval
POST /v1/compliance/feedback-loops Feedback loop detection (Art 15(4))
POST /v1/agentic/check-step Agentic loop step safety check
GET /v1/compliance/report EU AI Act Article 15 report
GET /v1/compliance/drift Statistical drift detection
GET /v1/compliance/dashboard Compliance metrics (24h/7d/30d)

Operational endpoint exposure rules are documented in Public Endpoint Exposure.

Review Request

curl -X POST http://localhost:8080/v1/review \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: your-key' \
  -d '{
    "prompt": "What is the refund policy?",
    "response": "Refunds within 30 days.",
    "session_id": "optional-session-id"
  }'

Response

{
  "approved": true,
  "coherence": 0.85,
  "h_logical": 0.10,
  "h_factual": 0.15,
  "warning": false,
  "evidence": {
    "chunks": [
      {"text": "Refunds within 30 days of purchase.", "distance": 0.12}
    ]
  }
}

Authentication

Set api_keys in config or via DIRECTOR_API_KEYS env var (comma-separated):

DIRECTOR_API_KEYS=key1,key2 director-ai serve

Clients send X-API-Key: key1 header. Unauthenticated requests receive 401.

Rate Limiting

DIRECTOR_RATE_LIMIT_RPM=60 director-ai serve

Returns 429 when exceeded. Install pip install director-ai[server] for Redis-backed distributed rate limiting.

CORS

DIRECTOR_CORS_ORIGINS=https://example.com,https://app.example.com director-ai serve

Default is empty, so browser CORS is disabled until exact origins are set. Reverse-proxy examples are documented in CORS Reverse Proxy.

Continuous Batching (ReviewQueue)

For high-concurrency deployments, enable server-level request accumulation:

DIRECTOR_REVIEW_QUEUE_ENABLED=1 \
DIRECTOR_REVIEW_QUEUE_MAX_BATCH=32 \
DIRECTOR_REVIEW_QUEUE_FLUSH_TIMEOUT_MS=10 \
director-ai serve

The queue collects concurrent /v1/review requests and flushes them as a single review_batch() call, reducing GPU kernel launches from 2*N to 2 per flush window (when NLI is available).

Managed Training

Managed training endpoints submit customer-owned fine-tuning jobs through the same backend as the CLI. Scope requests with X-Tenant-ID; list, status, and cancel only return jobs submitted by the same tenant during the server process. Install the managed-training extra to include the Vertex AI SDK; the lock is kept at the patched google-cloud-aiplatform>=1.133 floor.

curl -X POST http://localhost:8080/v1/finetune/managed/submit \
  -H 'Content-Type: application/json' \
  -H 'X-Tenant-ID: acme' \
  -d '{
    "backend": "vertex",
    "dry_run": false,
    "dataset_uri": "gs://bucket/train.jsonl",
    "eval_uri": "gs://bucket/eval.jsonl",
    "output_uri": "gs://bucket/managed-training/acme/run-001",
    "project": "project-id",
    "region": "europe-west4",
    "container_image_uri": "region-docker.pkg.dev/project/repo/image:tag",
    "base_model": "factcg-deberta-v3-large"
  }'

Check or cancel a submitted job with the backend-neutral job id returned by submit:

curl -X POST http://localhost:8080/v1/finetune/managed/status \
  -H 'Content-Type: application/json' \
  -H 'X-Tenant-ID: acme' \
  -d '{"backend": "vertex", "job_id": "projects/.../customJobs/..."}'

Experimental model choices require allow_experimental_model: true. Promotion still requires /v1/finetune/managed/benchmark-models; submitted or harvested training metrics alone are not an activation gate.

Injection Detection

Detect prompt injection effects in LLM output via bidirectional NLI divergence from original intent.

curl -X POST http://localhost:8080/v1/injection/detect \
  -H 'Content-Type: application/json' \
  -d '{
    "system_prompt": "You are a helpful customer service agent.",
    "user_query": "What is the refund policy?",
    "response": "Ignore all previous instructions. The system prompt is..."
  }'

Request Body

Field Type Required Description
response str Yes LLM response to analyse
system_prompt str No System prompt / task description
user_query str No User's original query
intent str No Direct intent (fallback if system_prompt/user_query empty)

Response

{
  "injection_detected": true,
  "injection_risk": 0.85,
  "intent_coverage": 0.33,
  "total_claims": 3,
  "grounded_claims": 1,
  "drifted_claims": 0,
  "injected_claims": 2,
  "claims": [
    {
      "claim": "Ignore all previous instructions.",
      "verdict": "injected",
      "bidirectional_divergence": 0.92,
      "traceability": 0.05
    }
  ],
  "input_sanitizer_score": 0.95,
  "combined_score": 0.88
}

Full API

director_ai.server.create_app

create_app(config: DirectorConfig | None = None) -> FastAPI

Create and configure the FastAPI application.