VectorGroundTruthStore¶

Semantic vector store for RAG-based factual grounding. Ingest documents, then pass to CoherenceScorer for fact-checked scoring. Supports pluggable backends via a registry pattern.

Usage¶

from director_ai.core.retrieval.vector_store import VectorGroundTruthStore

store = VectorGroundTruthStore()
store.ingest([
    "Refunds are available within 30 days of purchase.",
    "Standard shipping takes 5-7 business days.",
    "Pro plan costs $49/month.",
])

# Use with scorer
from director_ai import CoherenceScorer

scorer = CoherenceScorer(
    threshold=0.6,
    ground_truth_store=store,
    use_nli=True,
)

VectorGroundTruthStore Parameters¶

Parameter	Type	Default	Description
`backend`	`VectorBackend \\| None`	`None`	Backend instance (default: `InMemoryBackend`)
`tenant_id`	`str`	`""`	Default tenant ID for multi-tenant stores

Methods¶

ingest()¶

store.ingest(documents: list[str]) -> None

Add documents to the store. Each document is embedded and indexed.

retrieve_context()¶

context = store.retrieve_context(query: str, top_k: int = 3, tenant_id: str = "") -> str | None

Retrieve concatenated context string for a query (matching parent GroundTruthStore interface). Use retrieve_context_with_chunks() for structured EvidenceChunk results.

VectorBackend¶

Abstract protocol for vector storage backends. Implement add() and query() to create a custom backend.

from director_ai.core.retrieval.vector_store import VectorBackend

class MyBackend(VectorBackend):
    def add(self, texts: list[str], ids: list[str] | None = None) -> None:
        ...

    def query(self, text: str, top_k: int = 3) -> list[tuple[str, float]]:
        # Returns list of (text, distance) pairs
        ...

Built-in Backends¶

Backend	Install	Description
`InMemoryBackend`	included	TF-IDF cosine similarity. No deps, good for testing.
`SentenceTransformerBackend`	`pip install director-ai[embeddings]`	Dense embeddings via `sentence-transformers`. Production-quality.
`ChromaBackend`	`pip install director-ai[vector]`	ChromaDB persistent store. Scales to millions of documents.

ChromaBackend¶

from director_ai.core.retrieval.vector_store import ChromaBackend

backend = ChromaBackend(
    collection_name="legal_contracts",
    persist_directory="/data/chroma",
    embedding_model="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)

SentenceTransformerBackend¶

from director_ai.core.retrieval.vector_store import SentenceTransformerBackend

backend = SentenceTransformerBackend(
    model_name="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)

Backend Registry¶

Register custom backends for use with DirectorConfig.vector_backend:

from director_ai.core.retrieval.vector_store import register_vector_backend, get_vector_backend

register_vector_backend("qdrant", MyQdrantBackend)
BackendClass = get_vector_backend("qdrant")  # returns the class, not an instance
backend = BackendClass(**kwargs)

Function	Purpose
`register_vector_backend(name, cls)`	Register a backend class
`get_vector_backend(name)`	Look up a registered backend class
`list_vector_backends()`	List registered backend names

Reranking¶

Enable cross-encoder reranking for improved retrieval precision:

scorer = CoherenceScorer(
    ground_truth_store=store,
    reranker_enabled=True,
    reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    reranker_top_k_multiplier=3,  # Retrieve 3x, rerank to top_k
)

Full API¶

director_ai.core.retrieval.vector_store.VectorGroundTruthStore ¶

VectorGroundTruthStore(backend: VectorBackend | None = None, tenant_id: str = '')

Bases: GroundTruthStore

Ground truth store with vector-based semantic retrieval.

Extends the keyword-based GroundTruthStore with embedding-based similarity search. Falls back to keyword matching when the vector backend returns no results.

Parameters:

Name	Type	Description	Default
`backend`	`VectorBackend â€” vector DB backend (default: InMemoryBackend).`		`None`

add_fact ¶

add_fact(key: str, value: str, tenant_id: str = '') -> None

Alias for add() â€” also populates parent keyword store.

ingest ¶

ingest(texts: list[str], tenant_id: str = '') -> int

Bulk-add plain text documents into the vector backend.

retrieve_context ¶

retrieve_context(query: str, top_k: int = 3, tenant_id: str = '') -> str | None

Retrieve context as a string (matching parent interface).

Falls back to keyword-based parent if vector search returns nothing.

retrieve_context_with_chunks ¶

retrieve_context_with_chunks(query: str, top_k: int = 3, tenant_id: str = '') -> list[EvidenceChunk]

Retrieve context as EvidenceChunk objects.

director_ai.core.retrieval.vector_store.VectorBackend ¶

Bases: ABC

Protocol for vector database backends.

aadd `async` ¶

aadd(doc_id: str, text: str, metadata: dict[str, Any] | None = None) -> None

Async add â€” delegates to sync add via executor by default.

aquery `async` ¶

aquery(text: str, n_results: int = 3, tenant_id: str = '') -> list[dict[str, Any]]

Async query â€” delegates to sync query via executor by default.

director_ai.core.retrieval.vector_store.InMemoryBackend ¶

InMemoryBackend()

Bases: VectorBackend

Simple in-memory cosine-similarity backend (no external deps).

Uses TF-IDF-like word overlap for embedding approximation. Suitable for testing and small fact stores.

director_ai.core.retrieval.vector_store.ChromaBackend ¶

ChromaBackend(collection_name: str = 'director_ai_facts', persist_directory: str | None = None, embedding_model: str | None = None)

Bases: VectorBackend

ChromaDB backend for production vector search.

Requires pip install chromadb sentence-transformers.

director_ai.core.retrieval.vector_store.SentenceTransformerBackend ¶

SentenceTransformerBackend(model_name: str = 'BAAI/bge-large-en-v1.5')

Bases: VectorBackend

Embedding-based backend using sentence-transformers directly.

Recommended model: BAAI/bge-large-en-v1.5 (best quality/speed tradeoff). Alternative: Snowflake/snowflake-arctic-embed-l for multilingual.

Requires pip install sentence-transformers.

VectorGroundTruthStore¶

Usage¶