Architecture¶
Component Overview¶
graph TD
subgraph "CoherenceScorer"
NLI["NLIScorer<br/>(DeBERTa / ONNX / MiniCheck)"]
GTS["GroundTruthStore<br/>(keyword + vector retrieval)"]
LLM["LLM Judge<br/>(OpenAI / Anthropic)"]
Cache["ScoreCache<br/>(LRU + TTL)"]
end
subgraph "StreamingKernel"
Window["Sliding Window<br/>(avg, trend, hard limit)"]
Halt["Halt Decision<br/>(hard / soft mode)"]
end
subgraph "InjectionDetector"
Stage1["Stage 1: InputSanitizer<br/>(regex patterns)"]
Stage2["Stage 2: Bidirectional NLI<br/>(intent divergence)"]
Verdict["Per-Claim Verdicts<br/>(grounded / drifted / injected)"]
end
subgraph "VerifiedScorer"
SentMatch["Sentence Matching<br/>(NLI pair-wise)"]
Signals["5 Signals<br/>(NLI, Entity, Number, Negation, Traceability)"]
Verdicts["Per-Claim Verdicts<br/>(supported / contradicted / fabricated)"]
end
subgraph "VectorBackend"
IMB["InMemoryBackend"]
Chroma["ChromaBackend"]
ST["SentenceTransformerBackend"]
Hybrid["HybridBackend<br/>(BM25 + Dense + RRF)"]
ColBERT["ColBERTBackend<br/>(late interaction)"]
Pinecone["PineconeBackend"]
Reranker["RerankedBackend<br/>(cross-encoder wrapper)"]
end
Prompt["Prompt + Response"] --> CoherenceScorer
Prompt --> |"/v1/injection/detect"| InjectionDetector
InjectionDetector --> Stage1 --> Stage2 --> Verdict
Prompt --> |"/v1/verify"| VerifiedScorer
VerifiedScorer --> SentMatch --> Signals --> Verdicts
CoherenceScorer --> NLI
CoherenceScorer --> GTS
CoherenceScorer --> LLM
CoherenceScorer --> Cache
GTS --> VectorBackend
CoherenceScorer --> |score| Halt
TokenStream["Token Stream"] --> Window
Window --> |coherence_callback| CoherenceScorer
Window --> Halt
Halt --> |approved / halted| Output["Output"]
Request Lifecycle¶
sequenceDiagram
participant App as Your Application
participant Guard as guard() / SDK Interceptor
participant Scorer as CoherenceScorer
participant Cache as ScoreCache
participant NLI as NLI Model
participant KB as GroundTruthStore
participant LLM as LLM Judge (optional)
App->>Guard: client.chat.completions.create(...)
Guard->>Guard: Intercept response
Guard->>Scorer: review(prompt, response)
Scorer->>Cache: lookup(prompt, response)
alt Cache hit
Cache-->>Scorer: cached CoherenceScore
else Cache miss
par Logical + Factual in parallel
Scorer->>NLI: score(prompt, response) → H_logical
Scorer->>KB: retrieve_context(prompt)
KB-->>Scorer: facts
Scorer->>NLI: score(facts, response) → H_factual
end
Scorer->>Scorer: coherence = 1 - (0.6·H_L + 0.4·H_F)
opt Hybrid mode + low NLI confidence
Scorer->>LLM: "Does response follow from context?"
LLM-->>Scorer: YES/NO → blend
end
Scorer->>Cache: store result
end
Scorer-->>Guard: (approved, CoherenceScore)
alt approved
Guard-->>App: original response
else rejected
Guard-->>App: HallucinationError / log / metadata
end
Data Flow¶
- Prompt arrives at
CoherenceScorer.review(prompt, response) - Cache check: if
(prompt, response)was recently scored, return cached result - Logical divergence: NLI scores
(prompt, response)for contradiction probability - Factual divergence:
GroundTruthStore.retrieve_context(prompt)fetches KB facts, then NLI scores(context, response) - Composite score:
coherence = 1 - (W_LOGIC * H_logical + W_FACT * H_factual) - LLM judge (optional): if NLI confidence is low (or
hybridmode), escalate to LLM-as-judge and blend scores - Gate:
approved = coherence >= threshold
Dual-Entropy Formula¶
The coherence score combines two independent divergence signals:
| Constant | Default | Description |
|---|---|---|
W_LOGIC |
0.6 | Weight for logical divergence (NLI contradiction) |
W_FACT |
0.4 | Weight for factual divergence (RAG retrieval) |
LLM judge blending (when activated):
| Constant | Value | Description |
|---|---|---|
LLM_JUDGE_NLI_WEIGHT |
0.7 | NLI score weight in blend |
LLM_JUDGE_LLM_WEIGHT |
0.3 | LLM judge weight in blend |
LLM_JUDGE_AGREE_DIVERGENCE |
0.2 | Divergence when LLM says YES |
LLM_JUDGE_DISAGREE_DIVERGENCE |
0.8 | Divergence when LLM says NO |
Streaming Oversight¶
StreamingKernel processes tokens one-by-one with three halt mechanisms:
- Hard limit: immediate halt if any token's coherence <
hard_limit - Window average: halt if sliding window mean <
window_threshold - Downward trend: halt if coherence drops >
trend_thresholdovertrend_windowtokens
Soft-halt mode (halt_mode="soft") finishes the current sentence before halting (50-token safety cap).
Vector Backend ABC¶
All vector backends implement three methods:
class VectorBackend(ABC):
def add(self, doc_id: str, text: str, metadata: dict | None = None) -> None: ...
def query(self, text: str, n_results: int = 3) -> list[dict]: ...
def count(self) -> int: ...
VectorGroundTruthStore wraps any backend and falls back to keyword matching when vector search returns no results.