Benchmarks¶
Benchmarks are runnable, committed scripts under benchmarks/, with their
results checked in under benchmarks/results/. No number here is estimated by
hand. Run them with make bench.
Relay token benchmark¶
relay_token_benchmark.py measures how much the lite relay encoding shrinks a
channel feed for a token-budgeted observer. It replays a fixed trace of broadcast
envelopes and reports three serialisations so the saving is decomposed honestly,
not quoted as one inflated figure: the full envelope on the wire, the same core
fields minified, and the lite encoding.
On the committed 12-message trace, the lite log is 1662 of 2826 bytes (59%)
and 568 of 919 tokens (62%) of the raw wire form. Holding the field set
fixed, short keys plus minification account for the lite-vs-core ratio (87%); the
rest of the reduction is the lite format dropping auxiliary fields an observer
does not need. Byte counts are exact; token counts use tiktoken when installed.
Routing benchmark¶
routing_benchmark.py measures how the task-class router classifies a fixed
prompt set and verifies that a tiered client dispatches each prompt to the
backend for its class. On the committed 15-prompt set the split is 4 rule / 4
slm / 7 heavy, and dispatch is verified.
Per-tier model latency needs a live model server, so it is out of the offline scope and is documented as such rather than fabricated.