Skip to content

Benchmarks

Benchmarks are runnable, committed scripts under benchmarks/, with their results checked in under benchmarks/results/. No number here is estimated by hand. Run them with make bench.

Relay token benchmark

relay_token_benchmark.py measures how much the lite relay encoding shrinks a channel feed for a token-budgeted observer. It replays a fixed trace of broadcast envelopes and reports three serialisations so the saving is decomposed honestly, not quoted as one inflated figure: the full envelope on the wire, the same core fields minified, and the lite encoding.

On the committed 12-message trace, the lite log is 1662 of 2826 bytes (59%) and 568 of 919 tokens (62%) of the raw wire form. Holding the field set fixed, short keys plus minification account for the lite-vs-core ratio (87%); the rest of the reduction is the lite format dropping auxiliary fields an observer does not need. Byte counts are exact; token counts use tiktoken when installed.

Routing benchmark

routing_benchmark.py measures how the task-class router classifies a fixed prompt set and verifies that a tiered client dispatches each prompt to the backend for its class. On the committed 15-prompt set the split is 4 rule / 4 slm / 7 heavy, and dispatch is verified.

Per-tier model latency needs a live model server, so it is out of the offline scope and is documented as such rather than fabricated.