REST Server¶

Production-ready FastAPI server exposing Director-AI scoring over HTTP.

Starting the Server¶

CLIPythonDocker

director-ai serve --port 8080 --workers 4

from director_ai.server import create_app

app = create_app()
# Run with: uvicorn director_ai.server:app --host 0.0.0.0 --port 8080

docker build -t director-ai . && docker run -p 8080:8080 director-ai

Endpoints¶

Method	Path	Description
`POST`	`/v1/review`	Score a prompt/response pair
`POST`	`/v1/verify`	Sentence-level multi-signal fact verification
`POST`	`/v1/process`	Full agent pipeline (generate + score)
`POST`	`/v1/batch`	Batch score multiple pairs
`GET`	`/v1/health`	Liveness probe (version, mode, NLI status)
`GET`	`/v1/ready`	Readiness probe — 503 if scorer/NLI not loaded
`GET`	`/v1/config`	Config introspection
`GET`	`/v1/metrics`	Metrics as JSON
`GET`	`/v1/metrics/prometheus`	Prometheus-compatible metrics
`GET`	`/v1/source`	Source code URL (AGPL compliance)
`WS`	`/v1/stream`	WebSocket streaming oversight
`POST`	`/v1/knowledge/upload`	Upload file → parse → chunk → embed
`POST`	`/v1/knowledge/ingest`	Ingest raw text → chunk → embed
`GET`	`/v1/knowledge/documents`	List documents per tenant
`DELETE`	`/v1/knowledge/documents/{id}`	Delete document and chunks
`PUT`	`/v1/knowledge/documents/{id}`	Re-ingest updated content
`GET`	`/v1/knowledge/search`	Test retrieval quality
`POST`	`/v1/knowledge/tune-embeddings`	Fine-tune embeddings on ingested docs
`GET`	`/v1/knowledge/documents/{id}`	Get single document metadata
`GET`	`/v1/tenants`	List tenants (scoped to caller's binding)
`POST`	`/v1/tenants/{id}/facts`	Add keyword fact for tenant
`POST`	`/v1/tenants/{id}/vector-facts`	Add vector fact for tenant
`GET/DELETE`	`/v1/sessions/{id}`	Get or delete a scoring session
`GET`	`/v1/stats`	Aggregate scoring statistics
`GET`	`/v1/stats/hourly`	Hourly scoring breakdown
`GET`	`/v1/dashboard`	Dashboard summary (stats + top tenants)
`POST`	`/v1/finetune/start`	Start domain fine-tuning job
`GET`	`/v1/finetune/{job_id}`	Check local fine-tuning job status
`POST`	`/v1/finetune/managed/submit`	Submit or dry-run managed training
`GET`	`/v1/finetune/managed/jobs`	List managed training submissions for a tenant
`POST`	`/v1/finetune/managed/status`	Refresh managed training backend status
`POST`	`/v1/finetune/managed/cancel`	Cancel a live managed training job
`GET`	`/v1/finetune/managed/models`	List selectable managed training base models
`POST`	`/v1/finetune/managed/benchmark-models`	Anti-regression benchmark for trained artefacts
`POST`	`/v1/verify/numeric`	Numeric consistency verification
`POST`	`/v1/verify/reasoning`	Reasoning chain logic verification
`POST`	`/v1/temporal-freshness`	Temporal freshness / staleness scoring
`POST`	`/v1/consensus`	Cross-model factual agreement
`POST`	`/v1/injection/detect`	Intent-grounded prompt injection detection
`POST`	`/v1/adversarial/test`	Adversarial robustness self-test
`POST`	`/v1/conformal/predict`	Conformal prediction interval
`POST`	`/v1/compliance/feedback-loops`	Feedback loop detection (Art 15(4))
`POST`	`/v1/agentic/check-step`	Agentic loop step safety check
`GET`	`/v1/compliance/report`	EU AI Act Article 15 report
`GET`	`/v1/compliance/drift`	Statistical drift detection
`GET`	`/v1/compliance/dashboard`	Compliance metrics (24h/7d/30d)

Operational endpoint exposure rules are documented in Public Endpoint Exposure.

Review Request¶

curl -X POST http://localhost:8080/v1/review \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: your-key' \
  -d '{
    "prompt": "What is the refund policy?",
    "response": "Refunds within 30 days.",
    "session_id": "optional-session-id"
  }'

Response¶

{
  "approved": true,
  "coherence": 0.85,
  "h_logical": 0.10,
  "h_factual": 0.15,
  "warning": false,
  "evidence": {
    "chunks": [
      {"text": "Refunds within 30 days of purchase.", "distance": 0.12}
    ]
  }
}

Authentication¶

Set api_keys in config or via DIRECTOR_API_KEYS env var (comma-separated):

DIRECTOR_API_KEYS=key1,key2 director-ai serve

Clients send X-API-Key: key1 header. Unauthenticated requests receive 401.

Rate Limiting¶

DIRECTOR_RATE_LIMIT_RPM=60 director-ai serve

Returns 429 when exceeded. Install pip install director-ai[server] for Redis-backed distributed rate limiting.

CORS¶

DIRECTOR_CORS_ORIGINS=https://example.com,https://app.example.com director-ai serve

Default is empty, so browser CORS is disabled until exact origins are set. Reverse-proxy examples are documented in CORS Reverse Proxy.

Continuous Batching (ReviewQueue)¶

For high-concurrency deployments, enable server-level request accumulation:

DIRECTOR_REVIEW_QUEUE_ENABLED=1 \
DIRECTOR_REVIEW_QUEUE_MAX_BATCH=32 \
DIRECTOR_REVIEW_QUEUE_FLUSH_TIMEOUT_MS=10 \
director-ai serve

The queue collects concurrent /v1/review requests and flushes them as a single review_batch() call, reducing GPU kernel launches from 2*N to 2 per flush window (when NLI is available).

Managed Training¶

Managed training endpoints submit customer-owned fine-tuning jobs through the same backend as the CLI. Scope requests with X-Tenant-ID; list, status, and cancel only return jobs submitted by the same tenant during the server process. Install the managed-training extra to include the Vertex AI SDK; the lock is kept at the patched google-cloud-aiplatform>=1.133 floor.

curl -X POST http://localhost:8080/v1/finetune/managed/submit \
  -H 'Content-Type: application/json' \
  -H 'X-Tenant-ID: acme' \
  -d '{
    "backend": "vertex",
    "dry_run": false,
    "dataset_uri": "gs://bucket/train.jsonl",
    "eval_uri": "gs://bucket/eval.jsonl",
    "output_uri": "gs://bucket/managed-training/acme/run-001",
    "project": "project-id",
    "region": "europe-west4",
    "container_image_uri": "region-docker.pkg.dev/project/repo/image:tag",
    "base_model": "factcg-deberta-v3-large"
  }'

Check or cancel a submitted job with the backend-neutral job id returned by submit:

curl -X POST http://localhost:8080/v1/finetune/managed/status \
  -H 'Content-Type: application/json' \
  -H 'X-Tenant-ID: acme' \
  -d '{"backend": "vertex", "job_id": "projects/.../customJobs/..."}'

Experimental model choices require allow_experimental_model: true. Promotion still requires /v1/finetune/managed/benchmark-models; submitted or harvested training metrics alone are not an activation gate.

Injection Detection¶

Detect prompt injection effects in LLM output via bidirectional NLI divergence from original intent.

curl -X POST http://localhost:8080/v1/injection/detect \
  -H 'Content-Type: application/json' \
  -d '{
    "system_prompt": "You are a helpful customer service agent.",
    "user_query": "What is the refund policy?",
    "response": "Ignore all previous instructions. The system prompt is..."
  }'

Request Body¶

Field	Type	Required	Description
`response`	`str`	Yes	LLM response to analyse
`system_prompt`	`str`	No	System prompt / task description
`user_query`	`str`	No	User's original query
`intent`	`str`	No	Direct intent (fallback if system_prompt/user_query empty)

Response¶

{
  "injection_detected": true,
  "injection_risk": 0.85,
  "intent_coverage": 0.33,
  "total_claims": 3,
  "grounded_claims": 1,
  "drifted_claims": 0,
  "injected_claims": 2,
  "claims": [
    {
      "claim": "Ignore all previous instructions.",
      "verdict": "injected",
      "bidirectional_divergence": 0.92,
      "traceability": 0.05
    }
  ],
  "input_sanitizer_score": 0.95,
  "combined_score": 0.88
}

Full API¶

director_ai.server.create_app ¶

create_app(config: DirectorConfig | None = None) -> FastAPI

Create and configure the FastAPI application.