Score Cache¶
Thread-safe LRU + TTL cache for coherence scores. Avoids redundant NLI inference on repeated prompt/response pairs. Reduces GPU cost by 60-80% in streaming workloads.
Usage¶
Pass cache_size to CoherenceScorer to enable transparent caching:
from director_ai import CoherenceScorer
scorer = CoherenceScorer(
cache_size=2048, # max entries
cache_ttl=300.0, # 5-minute TTL
use_nli=True,
)
# First call: NLI inference (~15ms)
approved, score = scorer.review("What is 2+2?", "4.")
# Second call with same inputs: cache hit (<0.1ms)
approved, score = scorer.review("What is 2+2?", "4.")
# Monitor cache performance
print(f"Hit rate: {scorer.cache.hit_rate:.1%}")
print(f"Size: {scorer.cache.size}")
ScoreCache¶
| Parameter | Type | Default | Description |
|---|---|---|---|
max_size |
int |
1024 |
Maximum cached entries |
ttl_seconds |
float |
300.0 |
Time-to-live per entry |
Properties¶
| Property | Type | Description |
|---|---|---|
hit_rate |
float |
Ratio of hits to total lookups |
size |
int |
Current number of cached entries |
hits |
int |
Total cache hits |
misses |
int |
Total cache misses |
Methods¶
get(query, prefix, tenant_id="", scope="") -> _CacheEntry | None— retrieve cached entry (score, h_logical, h_factual)put(query, prefix, score, h_logical, h_factual, tenant_id="", scope="")— store a scoreinvalidate()— bump generation counter, lazily expiring all current entriesclear()— flush all entries and reset hit/miss counters
Cache Key¶
The cache key is derived from (query, response) text content. TTL-expired and LRU-evicted entries are cleaned lazily on access.
Full API¶
director_ai.core.cache.ScoreCache
¶
Thread-safe LRU cache for coherence scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_size
|
int — maximum entries (default 1024).
|
|
1024
|
ttl_seconds
|
float — time-to-live per entry (default 300s).
|
|
300.0
|