VectorGroundTruthStore¶
Semantic vector store for RAG-based factual grounding. Ingest documents, then pass to CoherenceScorer for fact-checked scoring. Supports pluggable backends via a registry pattern.
Usage¶
from director_ai.core.retrieval.vector_store import VectorGroundTruthStore
store = VectorGroundTruthStore()
store.ingest([
"Refunds are available within 30 days of purchase.",
"Standard shipping takes 5-7 business days.",
"Pro plan costs $49/month.",
])
# Use with scorer
from director_ai import CoherenceScorer
scorer = CoherenceScorer(
threshold=0.6,
ground_truth_store=store,
use_nli=True,
)
VectorGroundTruthStore Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
backend |
VectorBackend \| None |
None |
Backend instance (default: InMemoryBackend) |
tenant_id |
str |
"" |
Default tenant ID for multi-tenant stores |
Methods¶
ingest()¶
Add documents to the store. Each document is embedded and indexed.
retrieve_context()¶
Retrieve concatenated context string for a query (matching parent GroundTruthStore interface). Use retrieve_context_with_chunks() for structured EvidenceChunk results.
VectorBackend¶
Abstract protocol for vector storage backends. Implement add() and query() to create a custom backend.
from director_ai.core.retrieval.vector_store import VectorBackend
class MyBackend(VectorBackend):
def add(self, texts: list[str], ids: list[str] | None = None) -> None:
...
def query(self, text: str, top_k: int = 3) -> list[tuple[str, float]]:
# Returns list of (text, distance) pairs
...
Built-in Backends¶
| Backend | Install | Description |
|---|---|---|
InMemoryBackend |
included | TF-IDF cosine similarity. No deps, good for testing. |
SentenceTransformerBackend |
pip install director-ai[embeddings] |
Dense embeddings via sentence-transformers. Production-quality. |
ChromaBackend |
pip install director-ai[vector] |
ChromaDB persistent store. Scales to millions of documents. |
ChromaBackend¶
from director_ai.core.retrieval.vector_store import ChromaBackend
backend = ChromaBackend(
collection_name="legal_contracts",
persist_directory="/data/chroma",
embedding_model="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)
SentenceTransformerBackend¶
from director_ai.core.retrieval.vector_store import SentenceTransformerBackend
backend = SentenceTransformerBackend(
model_name="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)
Backend Registry¶
Register custom backends for use with DirectorConfig.vector_backend:
from director_ai.core.retrieval.vector_store import register_vector_backend, get_vector_backend
register_vector_backend("qdrant", MyQdrantBackend)
BackendClass = get_vector_backend("qdrant") # returns the class, not an instance
backend = BackendClass(**kwargs)
| Function | Purpose |
|---|---|
register_vector_backend(name, cls) |
Register a backend class |
get_vector_backend(name) |
Look up a registered backend class |
list_vector_backends() |
List registered backend names |
Reranking¶
Enable cross-encoder reranking for improved retrieval precision:
scorer = CoherenceScorer(
ground_truth_store=store,
reranker_enabled=True,
reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
reranker_top_k_multiplier=3, # Retrieve 3x, rerank to top_k
)
Full API¶
director_ai.core.retrieval.vector_store.VectorGroundTruthStore
¶
Bases: GroundTruthStore
Ground truth store with vector-based semantic retrieval.
Extends the keyword-based GroundTruthStore with embedding-based
similarity search. Falls back to keyword matching when the vector
backend returns no results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend
|
VectorBackend — vector DB backend (default: InMemoryBackend).
|
|
None
|
add_fact
¶
Alias for add() — also populates parent keyword store.
ingest
¶
Bulk-add plain text documents into the vector backend.
retrieve_context
¶
Retrieve context as a string (matching parent interface).
Falls back to keyword-based parent if vector search returns nothing.
retrieve_context_with_chunks
¶
retrieve_context_with_chunks(query: str, top_k: int = 3, tenant_id: str = '') -> list[EvidenceChunk]
Retrieve context as EvidenceChunk objects.
director_ai.core.retrieval.vector_store.VectorBackend
¶
director_ai.core.retrieval.vector_store.InMemoryBackend
¶
Bases: VectorBackend
Simple in-memory cosine-similarity backend (no external deps).
Uses TF-IDF-like word overlap for embedding approximation. Suitable for testing and small fact stores.
director_ai.core.retrieval.vector_store.ChromaBackend
¶
ChromaBackend(collection_name: str = 'director_ai_facts', persist_directory: str | None = None, embedding_model: str | None = None)
Bases: VectorBackend
ChromaDB backend for production vector search.
Requires pip install chromadb sentence-transformers.
director_ai.core.retrieval.vector_store.SentenceTransformerBackend
¶
Bases: VectorBackend
Embedding-based backend using sentence-transformers directly.
Recommended model: BAAI/bge-large-en-v1.5 (best quality/speed tradeoff). Alternative: Snowflake/snowflake-arctic-embed-l for multilingual.
Requires pip install sentence-transformers.