Skip to content

Score Cache

Thread-safe LRU + TTL cache for coherence scores. Avoids redundant NLI inference on repeated prompt/response pairs. Reduces GPU cost by 60-80% in streaming workloads.

Usage

Pass cache_size to CoherenceScorer to enable transparent caching:

from director_ai import CoherenceScorer

scorer = CoherenceScorer(
    cache_size=2048,  # max entries
    cache_ttl=300.0,  # 5-minute TTL
    use_nli=True,
)

# First call: NLI inference (~15ms)
approved, score = scorer.review("What is 2+2?", "4.")

# Second call with same inputs: cache hit (<0.1ms)
approved, score = scorer.review("What is 2+2?", "4.")

# Monitor cache performance
print(f"Hit rate: {scorer.cache.hit_rate:.1%}")
print(f"Size: {scorer.cache.size}")

ScoreCache

Parameter Type Default Description
max_size int 1024 Maximum cached entries
ttl_seconds float 300.0 Time-to-live per entry

Properties

Property Type Description
hit_rate float Ratio of hits to total lookups
size int Current number of cached entries
hits int Total cache hits
misses int Total cache misses

Methods

  • get(query, prefix, tenant_id="", scope="") -> _CacheEntry | None — retrieve cached entry (score, h_logical, h_factual)
  • put(query, prefix, score, h_logical, h_factual, tenant_id="", scope="") — store a score
  • invalidate() — bump generation counter, lazily expiring all current entries
  • clear() — flush all entries and reset hit/miss counters

Cache Key

The cache key is derived from (query, response) text content. TTL-expired and LRU-evicted entries are cleaned lazily on access.

Full API

director_ai.core.cache.ScoreCache

ScoreCache(max_size: int = 1024, ttl_seconds: float = 300.0)

Thread-safe LRU cache for coherence scores.

Parameters:

Name Type Description Default
max_size int — maximum entries (default 1024).
1024
ttl_seconds float — time-to-live per entry (default 300s).
300.0

invalidate

invalidate() -> None

Bump generation counter, lazily expiring all current entries.