Skip to content

VectorGroundTruthStore

Semantic vector store for RAG-based factual grounding. Ingest documents, then pass to CoherenceScorer for fact-checked scoring. Supports pluggable backends via a registry pattern.

Usage

from director_ai.core.retrieval.vector_store import VectorGroundTruthStore

store = VectorGroundTruthStore()
store.ingest([
    "Refunds are available within 30 days of purchase.",
    "Standard shipping takes 5-7 business days.",
    "Pro plan costs $49/month.",
])

# Use with scorer
from director_ai import CoherenceScorer

scorer = CoherenceScorer(
    threshold=0.6,
    ground_truth_store=store,
    use_nli=True,
)

VectorGroundTruthStore Parameters

Parameter Type Default Description
backend VectorBackend \| None None Backend instance (default: InMemoryBackend)
tenant_id str "" Default tenant ID for multi-tenant stores

Methods

ingest()

store.ingest(documents: list[str]) -> None

Add documents to the store. Each document is embedded and indexed.

retrieve_context()

context = store.retrieve_context(query: str, top_k: int = 3, tenant_id: str = "") -> str | None

Retrieve concatenated context string for a query (matching parent GroundTruthStore interface). Use retrieve_context_with_chunks() for structured EvidenceChunk results.


VectorBackend

Abstract protocol for vector storage backends. Implement add() and query() to create a custom backend.

from director_ai.core.retrieval.vector_store import VectorBackend

class MyBackend(VectorBackend):
    def add(self, texts: list[str], ids: list[str] | None = None) -> None:
        ...

    def query(self, text: str, top_k: int = 3) -> list[tuple[str, float]]:
        # Returns list of (text, distance) pairs
        ...

Built-in Backends

Backend Install Description
InMemoryBackend included TF-IDF cosine similarity. No deps, good for testing.
SentenceTransformerBackend pip install director-ai[embeddings] Dense embeddings via sentence-transformers. Production-quality.
ChromaBackend pip install director-ai[vector] ChromaDB persistent store. Scales to millions of documents.

ChromaBackend

from director_ai.core.retrieval.vector_store import ChromaBackend

backend = ChromaBackend(
    collection_name="legal_contracts",
    persist_directory="/data/chroma",
    embedding_model="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)

SentenceTransformerBackend

from director_ai.core.retrieval.vector_store import SentenceTransformerBackend

backend = SentenceTransformerBackend(
    model_name="BAAI/bge-large-en-v1.5",
)
store = VectorGroundTruthStore(backend=backend)

Backend Registry

Register custom backends for use with DirectorConfig.vector_backend:

from director_ai.core.retrieval.vector_store import register_vector_backend, get_vector_backend

register_vector_backend("qdrant", MyQdrantBackend)
BackendClass = get_vector_backend("qdrant")  # returns the class, not an instance
backend = BackendClass(**kwargs)
Function Purpose
register_vector_backend(name, cls) Register a backend class
get_vector_backend(name) Look up a registered backend class
list_vector_backends() List registered backend names

Reranking

Enable cross-encoder reranking for improved retrieval precision:

scorer = CoherenceScorer(
    ground_truth_store=store,
    reranker_enabled=True,
    reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    reranker_top_k_multiplier=3,  # Retrieve 3x, rerank to top_k
)

Full API

director_ai.core.retrieval.vector_store.VectorGroundTruthStore

VectorGroundTruthStore(backend: VectorBackend | None = None, tenant_id: str = '')

Bases: GroundTruthStore

Ground truth store with vector-based semantic retrieval.

Extends the keyword-based GroundTruthStore with embedding-based similarity search. Falls back to keyword matching when the vector backend returns no results.

Parameters:

Name Type Description Default
backend VectorBackend — vector DB backend (default: InMemoryBackend).
None

add_fact

add_fact(key: str, value: str, tenant_id: str = '') -> None

Alias for add() — also populates parent keyword store.

ingest

ingest(texts: list[str], tenant_id: str = '') -> int

Bulk-add plain text documents into the vector backend.

retrieve_context

retrieve_context(query: str, top_k: int = 3, tenant_id: str = '') -> str | None

Retrieve context as a string (matching parent interface).

Falls back to keyword-based parent if vector search returns nothing.

retrieve_context_with_chunks

retrieve_context_with_chunks(query: str, top_k: int = 3, tenant_id: str = '') -> list[EvidenceChunk]

Retrieve context as EvidenceChunk objects.

director_ai.core.retrieval.vector_store.VectorBackend

Bases: ABC

Protocol for vector database backends.

aadd async

aadd(doc_id: str, text: str, metadata: dict[str, Any] | None = None) -> None

Async add — delegates to sync add via executor by default.

aquery async

aquery(text: str, n_results: int = 3, tenant_id: str = '') -> list[dict[str, Any]]

Async query — delegates to sync query via executor by default.

director_ai.core.retrieval.vector_store.InMemoryBackend

InMemoryBackend()

Bases: VectorBackend

Simple in-memory cosine-similarity backend (no external deps).

Uses TF-IDF-like word overlap for embedding approximation. Suitable for testing and small fact stores.

director_ai.core.retrieval.vector_store.ChromaBackend

ChromaBackend(collection_name: str = 'director_ai_facts', persist_directory: str | None = None, embedding_model: str | None = None)

Bases: VectorBackend

ChromaDB backend for production vector search.

Requires pip install chromadb sentence-transformers.

director_ai.core.retrieval.vector_store.SentenceTransformerBackend

SentenceTransformerBackend(model_name: str = 'BAAI/bge-large-en-v1.5')

Bases: VectorBackend

Embedding-based backend using sentence-transformers directly.

Recommended model: BAAI/bge-large-en-v1.5 (best quality/speed tradeoff). Alternative: Snowflake/snowflake-arctic-embed-l for multilingual.

Requires pip install sentence-transformers.