SC-NeuroCore Benchmarks¶
Performance measurements for sc-neurocore v3.13.3. All Python numbers are CPU-only (NumPy backend). Rust numbers use Criterion with AVX-512 SIMD.
1. Environment¶
| Field | Value |
|---|---|
| Date | 2026-03-15 |
| Git tag | v3.13.3 |
| OS | Windows 11 Pro 10.0.26200 |
| CPU | Intel Core i5-11600K (6C/12T, 3.9 GHz base, AVX-512, DL Boost) |
| RAM | 32 GB DDR4-3200 |
| Python | 3.12.5 |
| NumPy | 1.26.4 |
| Rust | 1.86.0 (stable) |
| SIMD tier | avx512-vpopcntdq |
| GPU (NVIDIA) | GeForce GTX 1060 6GB — Pascal sm_61, PyTorch 2.6.0+cu124 |
| GPU (AMD) | Radeon RX 6600 XT — no ROCm on Windows, torch-directml incompatible |
| CuPy | unavailable — CUDA Toolkit 13.1 dropped Pascal (sm_61) support |
2. Scalar Primitives (Python)¶
| Operation | Iterations | Latency (µs) | Throughput |
|---|---|---|---|
| LFSR step (16-bit) | 1,000,000 | 0.8 | 1.33 Mstep/s |
| Bitstream encoder step | 1,000,000 | 0.9 | 1.10 Mstep/s |
| LIF neuron step (Q8.8) | 1,000,000 | 0.9 | 1.07 Mstep/s |
3. Packed Bitstream Operations (Python/NumPy)¶
| Operation | Size | Iterations | Latency (µs) | Throughput |
|---|---|---|---|---|
| pack_bitstream 1-D | 1,024 | 10,000 | 8.7 | 0.12 Gbit/s |
| pack_bitstream 1-D | 65,536 | 2,000 | 123.1 | 0.53 Gbit/s |
| pack_bitstream 2-D | 64×1,024 | 2,000 | 121.6 | 0.54 Gbit/s |
| vec_and | 1,024 words | 50,000 | 1.6 | 41.0 Gbit/s |
| vec_popcount SWAR | 1,024 words | 50,000 | 30.2 | 2.17 Gbit/s |
4. Dense Layer Forward Pass (Python)¶
| Configuration | Iterations | Latency (µs) | Throughput |
|---|---|---|---|
| 16×8, L=256 | 500 | 352.7 | 0.09 GOP/s (SC) |
| 64×32, L=1,024 | 100 | 2,405.8 | 0.87 GOP/s (SC) |
5. Full Pipeline (Python)¶
encode → AND synapse → popcount → LIF neuron
| Configuration | Iterations | Latency (µs) | Throughput |
|---|---|---|---|
| 4 synapses, 256 steps | 200 | 1,830 | 139.9 Kstep/s |
| 16 synapses, 256 steps | 50 | 8,679 | 29.5 Kstep/s |
6. GPU Backend¶
6a. Local CPU fallback (NumPy, no CuPy)¶
| Operation | Iterations | Latency (µs) | Throughput |
|---|---|---|---|
| gpu_pack_bitstream (65,536) | 2,000 | 375.9 | 0.17 Gbit/s |
| gpu_vec_mac (64×32×16w) | 1,000 | 736.4 | 2.85 GOP/s |
6b. Cloud GPU — NVIDIA RTX A6000 (48 GB, CUDA 12.6)¶
Environment: JarvisLabs A6000, Xeon Silver 4216 (64 vCPU), PyTorch 2.6.0+cu124. 1000 ms simulation, 3 runs, AI regime (conn_prob=0.1).
| Neurons | Synapses | Wall (s) | Rate (Hz) | Syn events/s | Peak RSS |
|---|---|---|---|---|---|
| 1,000 | 100K | 1.55 | 99.0 | 3.2 M | 12 MB |
| 2,000 | 400K | 1.80 | 85.5 | 9.5 M | 24 MB |
| 5,000 | 2.5M | 2.74 | 63.6 | 29.0 M | 104 MB |
| 20,000 | 40M | 8.80 | 26.1 | 59.2 M | 775 MB |
| 50,000 | 250M | 35.4 | 14.7 | 51.9 M | 4,793 MB |
Source: benchmarks/results/jarvislabs_a6000/gpu_large_scale.json,
benchmarks/results/jarvislabs_a6000/scaling_4regime.json.
7. Rust Engine — Criterion Results¶
All benchmarks run with cargo bench --manifest-path engine/Cargo.toml on
AVX-512 hardware. Times are Criterion medians.
Bitstream Packing (1M bits = 1,048,576 bits)¶
| Variant | Time | Throughput | vs. Python |
|---|---|---|---|
pack (scalar) |
897 µs | 1.17 Gbit/s | 2.2× |
pack_fast (u64 chunks) |
286 µs | 3.67 Gbit/s | 7× |
pack_dispatch (AVX-512) |
25.4 µs | 41.3 Gbit/s | 79× |
Popcount (16,384 u64 words = 1M bits)¶
| Variant | Time | Throughput | vs. Python |
|---|---|---|---|
popcount_portable |
12.1 µs | 86.6 Mword/s | 2.5× |
popcount_simd (AVX-512) |
2.86 µs | 366 Mword/s | 10.6× |
Fused AND+Popcount (16 words)¶
| Variant | Time |
|---|---|
| Scalar (iter + count_ones) | 19.1 ns |
| SIMD dispatch (AVX-512) | 9.58 ns |
Encoder / Neuron¶
| Operation | Time | Throughput |
|---|---|---|
| LFSR encoder (64K steps) | 131 µs | 500 Mstep/s |
| LIF neuron (10K steps) | 47.9 µs | 209 Mstep/s |
| LIF neuron (100K steps) | 446 µs | 224 Mstep/s |
Bernoulli Encoding (1,024-bit packed streams)¶
| Variant | Time | Notes |
|---|---|---|
bernoulli_stream (unrolled) |
3.99 µs | generate bits then pack |
bernoulli_stream + pack |
4.79 µs | two-pass |
bernoulli_packed (ChaCha8) |
4.14 µs | direct packed generation |
bernoulli_packed_fast (ChaCha8) |
1.72 µs | optimized threshold loop |
bernoulli_packed_simd (ChaCha8) |
779 ns | SIMD comparison |
bernoulli_packed_simd (Xoshiro) |
398 ns | fastest: SIMD + fast PRNG |
encode_and_popcount (Xoshiro) |
285 ns | fused encode+AND+popcount |
Dense Layer (64 inputs → 32 neurons, L=1024)¶
| Variant | Time | vs. Python |
|---|---|---|
forward (baseline) |
1.22 ms | 2.0× |
forward_fast (packed) |
337 µs | 7.1× |
forward_fused (encode+AND+pop) |
1.67 ms | 1.4× |
forward_prepacked (pre-encoded) |
54.9 µs | 43.8× |
forward_batch (100 samples) |
13.7 ms | 17.6× per sample |
PRNG Fill (1,024 bytes)¶
| Generator | Time | Throughput |
|---|---|---|
| ChaCha8 | 320 ns | 3.13 GB/s |
| Xoshiro256++ | 191 ns | 5.24 GB/s |
Domain-Specific¶
| Operation | Time |
|---|---|
| Kuramoto solver (100 osc, 1000 steps) | 199 ms |
| Stochastic attention (10×16 → 20×32) | 138 µs |
| Graph layer (20 nodes, 8 features) | 253 µs |
8. v2 (Python) vs v3 (Rust) Speedup¶
SIMD tier: avx512-vpopcntdq. The v3 engine wraps Rust via PyO3.
| Operation | v2 (ms) | v3 (ms) | Speedup |
|---|---|---|---|
| pack_bitstream (1M bits) | 8.26 | 52.66 | 0.2× |
| popcount (1M bits) | 0.12 | 0.44 | 0.3× |
| LIF neuron (10K steps) | 8.58 | 4.48 | 1.9× |
| Dense forward (16→8, L=1024) | 0.49 | 0.18 | 2.7× |
| Dense forward (64→32, L=1024) | 4.48 | 4.05 | 1.1× |
| Dense forward (128→64, L=1024) | 8.02 | 1.10 | 7.3× |
| Attention (10×16 → 20×32) | 0.03 | 0.28 | 0.1× |
| Attention (50×32 → 100×64) | 0.10 | 2.19 | 0.0× |
Geometric mean speedup: 0.5× — across all operations, the Rust FFI path is slower than pure Python on average. PyO3 call overhead (argument marshalling, GIL release/acquire) adds ~50–200 µs per invocation, which dominates when the payload is small.
The Rust engine amortises FFI cost above ~64K bits per call. On payloads >1M bits the SIMD kernel wins decisively (Dense 128→64 at 7.3×). For small networks (<64 neurons), pure Python is faster. The Rust engine targets large-payload inference (>=128 neurons, L>=1024).
The pure-Rust Criterion numbers in Section 7 show true engine throughput without FFI overhead.
9. NeuroBench-Aligned Metrics¶
Aligned with the NeuroBench methodology (Yik et al., 2023; arXiv:2304.04640).
| Model | Neurons | SynOps | Act. Sparsity | Latency (µs) | Throughput (MOP/s) | Memory (B) |
|---|---|---|---|---|---|---|
| SCDenseLayer(8×4, L=256) | 4 | 409,600 | 0.00 | 1,293 | 6.3 | 256 |
| SCDenseLayer(16×8, L=512) | 8 | 1,966,080 | 0.00 | 2,446 | 26.8 | 1,024 |
| VectorizedSCLayer(16×8, L=512) | 8 | 3,276,800 | 0.00 | 348 | 188.1 | 1,024 |
| VectorizedSCLayer(64×32, L=1024) | 32 | 41,943,040 | 0.00 | 2,476 | 847.0 | 16,384 |
Activation sparsity is 0.00 because SC outputs are graded probabilities, not binary spikes — every neuron produces a non-zero output on every step.
10. SNN Comparison: Brunel Balanced Network¶
4-Variant Translator Benchmark¶
1000 neurons (800E/200I), conn_prob=0.1, adapted params (weight_exc=5.0 mV,
external_rate=200 Hz), 1000 ms simulation. Delta-PSC semantics: synaptic
events applied as instantaneous voltage jumps (v += w), matching Brian2's
on_pre="v_post += w".
| Variant | Spikes | Rate (Hz) | Brian2 Ratio | Wall (s) |
|---|---|---|---|---|
| Brian2 reference | 1,057,908 | 1057.9 | 1.00 | 1.11 |
| V1 StochasticLIF | 1,725,955 | 1726.0 | 1.63 | 30.23 |
| V2 RateMatched | N/A | 0.049 (prob) | — | 51.81 |
| V3 FixedPoint Q8.8 | 1,722,195 | 1722.2 | 1.63 | 15.41 |
| V4 Hybrid SC+LIF | 1,888,351 | 1888.4 | 1.78 | 46.23 |
Variant descriptions¶
- V1 StochasticLIF: Bug-fixed delta-PSC wiring. Previous benchmark passed
input through
R * I * dt(diluted by dt=0.1) and omittedv_reset. Fixed: synaptic events asneuron.v += weight, Poisson drive as voltage kicks,v_reset=10.0passed correctly. - V2 RateMatched: VectorizedSCLayer in probability domain. Weights mapped
to
p = w / v_threshold. 100-neuron subset, bitstream_length=1024. Not spike-comparable; mean output probability = 0.0488. - V3 FixedPoint Q8.8: Hardware-faithful FixedPointLIFNeuron. Params mapped to Q8.8 integers (scale=256). Rate 1.63x Brian2 (higher due to different noise model).
- V4 Hybrid SC+LIF: BitstreamSynapse AND gates → popcount → voltage → StochasticLIFNeuron. Higher rate due to stochastic amplification in the bitstream encoding.
Historical note (v3.9.0, resolved)¶
Prior to v3.10.0, three wiring bugs prevented the Brunel network
from firing. All three were fixed in v3.10.0:
1. v_reset never passed (defaulted to 0.0 instead of 10.0)
2. Delta-PSC diluted through R * I * dt instead of direct v += w
3. Poisson drive fed as steady current instead of voltage kicks
10b. 20-Variant Brunel Translator Suite¶
Adapted Brunel parameters (weight_exc=5.0, ext_rate=200 Hz), 1000 neurons, 1000 ms simulation. Brian2 2.10.1 reference: 748,777 spikes, 748.8 Hz.
| # | Variant | Spikes | Rate (Hz) | Brian2 Ratio | Wall (s) | Note |
|---|---|---|---|---|---|---|
| — | Brian2 reference | 748,777 | 748.8 | 1.00 | 1.60 | |
| V1 | StochasticLIF | 1,725,955 | 1726.0 | 2.31 | 49.33 | delta-PSC baseline |
| V2 | RateMatched | — | 0.0488 (prob) | — | 80.98 | probability domain |
| V3 | FixedPoint Q8.8 | 1,722,195 | 1722.2 | 2.30 | 20.79 | hardware-faithful |
| V4 | Hybrid SC+LIF | 1,571,994 | 1572.0 | 2.10 | 42.46 | bitstream synapse |
| V5 | Izhikevich | 15,331 | 15.3 | 0.02 | 11.49 | burst dynamics |
| V6 | Homeostatic LIF | 1,727,113 | 1727.1 | 2.31 | 39.36 | adaptive threshold |
| V7 | Noisy LIF | 1,714,361 | 1714.4 | 2.29 | 44.62 | noise_std=1.0 |
| V8 | Refractory LIF | 114,317 | 114.3 | 0.15 | 26.56 | 5-step refractory |
| V9 | Post-kick LIF | 1,671,636 | 1671.6 | 2.23 | 36.16 | Brian2 timing |
| V10 | Exact-leak LIF | 1,713,399 | 1713.4 | 2.29 | 32.78 | exp(-dt/tau) |
| V11 | Q16.12 FixedPoint | 464,644 | 464.6 | 0.62 | 18.19 | 32-bit, 12 frac |
| V12 | STDP LIF | 1,689,552 | 1689.6 | 2.26 | 758.21 | 2000 STDP synapses |
| V13 | DotProduct LIF | 497,647 | 9952.9 | — | 261.40 | n=50, bl=256 |
| V14 | Sobol bitstream | 780,390 | 780.4 | 1.04 | 220.46 | low-discrepancy |
| V15 | JAX vectorized | — | — | — | — | skipped: JAX not installed |
| V16 | Recurrent reservoir | — | 0.9997 (prob) | — | 16.21 | probability domain |
| V17 | Memristive defects | — | 48.7 | — | 51.62 | stuck=1%, var=5% |
| V18 | Numba JIT | 1,685,521 | 1685.5 | 2.25 | 5.20 | 9.5× vs V1 |
| V19 | PyTorch CUDA | 1,725,955 | 1726.0 | 2.31 | 5.70 | GTX 1060 6GB |
| V20 | Vectorized NumPy | 1,725,955 | 1726.0 | 2.31 | 10.27 | batch update |
| V21 | Sparse Numba (CSR) | 1,685,521 | 1685.5 | 2.25 | 0.49 | 10% connectivity |
Acceleration comparison (1000 neurons, 1000 ms)¶
| Backend | Wall (s) | Speedup vs V1 |
|---|---|---|
| V1 per-neuron Python | 49.33 | 1.0× |
| V20 vectorized NumPy | 10.27 | 4.8× |
| V18 Numba JIT | 5.20 | 9.5× |
| V19 PyTorch CUDA (GTX 1060) | 5.70 | 8.7× |
| V21 Sparse Numba (CSR) | 0.49 | 100.7× |
| Brian2 (Cython) | 1.60 | 30.8× |
10K neuron scaling¶
| Backend | Wall (s) | Memory |
|---|---|---|
| V18 Numba JIT (dense) | 15.3 | 800 MB (N²×8) |
| V21 Sparse Numba (CSR) | 22.6 | 80 MB (10% nnz) |
| Brian2 (C++ codegen) | 9.6 | sparse (internal) |
At 10K, Brian2's compiled C++ sparse codegen wins. V21 CSR reduces memory 10× but scattered index access prevents SIMD vectorization. The Rust SIMD CSR engine (planned) targets this gap.
Variant notes¶
- V5 Izhikevich: Low spike rate expected — Izhikevich dynamics (quadratic nonlinearity, v range -65 to +30) respond differently to delta-PSC drive. Tonic baseline current of 5.0 added for sub-threshold depolarization.
- V8 Refractory: 5-step (0.5 ms) dead time reduces max firing rate to ~2000 Hz, cutting observed rate by 15×.
- V11 Q16.12: Higher precision fixed-point produces fewer spikes than Q8.8 due to more accurate leak computation (less rounding-induced depolarization).
- V12 STDP: Online weight learning with 2000 STDP synapses. 15× slower due to per-synapse process_step() calls.
- V14 Sobol: Low-discrepancy bitstream achieves 1.04× Brian2 ratio — closest match to reference among all spiking variants.
- V18/V19/V20: Acceleration variants show 5–10× speedup over per-neuron Python loop. Numba JIT and PyTorch CUDA achieve similar wall times on this workload (1000 neurons); GPU advantage grows with N.
- V21 Sparse Numba: scipy.sparse CSR connectivity. At 1K (10% connectivity): 100× faster than V1, 3× faster than V18 dense. At 10K: 1.5× slower than V18 due to scattered CSR index access, but uses 10× less memory (80 MB vs 800 MB).
11. Advanced Module Performance¶
| Module | Configuration | Latency (100 runs) | Per-run | Key metric |
|---|---|---|---|---|
| Quantum-Classical Hybrid | 64 qubits, L=1024 | 76.8 ms | 0.77 ms | cos²(θ/2) error < 0.03 |
| Event-Based GNN | 100 nodes, 5% density | 6.6 ms | 0.07 ms | 17× sparse reduction |
| Stochastic Transformer | d=64, 4 heads, L=512 | 1,691 ms | 16.9 ms | 196× energy vs FP32 MAC |
| BCI Decoder | 64 ch, 1s signal | 19.5 ms | 0.20 ms | Native bitstream encoding |
| DVS Input Layer | 128×128, 1000 events | 1,249 ms | 12.5 ms | 492× data reduction |
| Chaotic RNG | 100K samples | 13.5 ms | — | 7.42 Msample/s |
| Predictive World Model | 32-dim state, 50-step | 34.5 ms | 0.34 ms | 1000× sample efficiency |
12. FPGA Resource Utilization¶
Synthesis tooling (tools/yosys_synth.py) targets Xilinx 7-series via Yosys
synth_xilinx. Yosys is not installed on this machine; run when available:
python tools/yosys_synth.py --json benchmarks/results/yosys_synth.json --markdown
Target modules: sc_bitstream_encoder, sc_lif_neuron, sc_bitstream_synapse,
sc_dotproduct_to_current, sc_firing_rate_bank, sc_dense_layer_core,
sc_neurocore_top.
Estimated: sc_bitstream_encoder < 100 LUTs (pending Yosys validation).
13. Bitstream Length Scaling (32x16 Dense)¶
Fixed network: 32 inputs, 16 neurons. Mean of 5 runs per length. Expected: roughly linear scaling (2x L = 2x time).
| L | Mean Time (ms) | Throughput (Mbit/s) |
|---|---|---|
| 128 | 0.43 | 151 |
| 256 | 0.66 | 197 |
| 512 | 1.05 | 250 |
| 1024 | 1.14 | 459 |
| 2048 | 1.36 | 773 |
| 4096 | 4.22 | 497 |
Scaling is sub-linear up to L=2048 due to NumPy vectorization amortizing fixed overhead. At L=4096, packed array allocation begins to dominate.
14. Memory Footprint (L=1024)¶
Peak allocation measured via tracemalloc (includes layer construction
and one forward pass). Weight matrix size is the float64 weight array only.
| Config | Weight Matrix (MB) | Peak Alloc (MB) | Forward Time (ms) |
|---|---|---|---|
| 32x16 (tiny) | 0.004 | 4.63 | 3.1 |
| 64x32 (small) | 0.016 | 18.33 | 7.5 |
| 128x64 (medium) | 0.062 | 73.13 | 13.2 |
| 256x128 (large) | 0.250 | 292.31 | 26.2 |
Peak allocation scales as O(N_neurons * N_inputs * L / 8) bytes for the packed bitstream arrays, which dominate the weight matrix by ~1000x.
15. Reproducing¶
# Python benchmark suite (quick ~15s, full ~120s)
python benchmarks/benchmark_suite.py --full --markdown
# Rust Criterion benchmarks (~5 min)
cargo bench --manifest-path engine/Cargo.toml
# v2 vs v3 comparison (requires Rust wheel)
PYTHONPATH=src python benchmarks/bench_v2_vs_v3.py
# NeuroBench-aligned metrics
python benchmarks/neurobench_harness.py --json benchmarks/results/neurobench.json --markdown
# SNN comparison — 20 variants (requires brian2: pip install brian2)
python benchmarks/snn_comparison.py --all --adapted --sim-ms 1000 \
--json benchmarks/results/snn_translator_20v.json --markdown
# Advanced modules
python benchmarks/benchmark_advanced_modules.py
# FPGA synthesis (requires yosys in PATH)
python tools/yosys_synth.py --json benchmarks/results/yosys_synth.json --markdown
12. snnTorch Head-to-Head Comparison¶
Artifact: benchmarks/results/snntorch_vs_sc_microbench.json
Three-way comparison: SC-NeuroCore (NumPy), SC-NeuroCore (Rust SIMD), snnTorch 0.9.4.
| Test | SC NumPy (us/step) | SC Rust SIMD (us/step) | snnTorch (us/step) |
|---|---|---|---|
| Single neuron (1000 steps) | 3.7 | — | 876 |
| Dense 100->50 (500 steps) | 2,280 | 1,059 | 1,103 |
| Scale 500->500 (100 steps) | 158,741 | 17,473 | 35,998 |
| Scale 1000->1000 (50 steps) | 602,730 | 28,882 | 9,421 |
Paradigm difference: SC-NeuroCore performs bit-true stochastic computation (uint64 popcount on packed bitstreams, L=256-512 bits per value). snnTorch does float32 matrix multiply. SC-NeuroCore is hardware-faithful (maps directly to Verilog RTL); snnTorch is GPU-optimized but not synthesizable.
- At small scale (1-100 neurons), SC-NeuroCore's zero-overhead Python step is 237x faster than snnTorch's PyTorch dispatch overhead.
- At medium scale (500 neurons), Rust SIMD engine is 2x faster than snnTorch.
- At large scale (1000+), snnTorch's O(n^2) float matmul beats bitstream packing at O(n^2 * L).
- Rust engine provides 9-21x speedup over Python SC at all scales.
python benchmarks/snntorch_vs_sc_microbench.py --runs 5 --scales 100 500 1000
16. Spike Codec Library (2026-03-25)¶
Compression ratios for the spike codec library. All codecs lossless. Measured on (2000 x 64) rasters at various firing rates.
ISI Codec vs General-Purpose Compressors¶
Auto entropy selection (varint for sparse, Huffman for dense):
| Firing Rate | ISI (auto) | zlib-9 | lzma | ISI Advantage |
|---|---|---|---|---|
| 0.1% | 401x | 359x | 194x | +12% over zlib |
| 1% | 78x | 65x | 48x | +20% over zlib |
| 5% | 24x | 19x | 20x | +28% over zlib |
| 10% | 16x | 12x | 13x | +30% over zlib |
| 30% | 8.8x | 7.0x | 7.8x | +24% over zlib |
Context Predictor on Structured Data¶
Periodic bursting (32ch, 5-spike bursts every 50 steps):
| Predictor | Ratio | Accuracy |
|---|---|---|
| ISI (no prediction) | 8.6x | — |
| EMA | 8.5x | 90.0% |
| Context (Markov) | 25.5x | 97.8% |
Realistic SpikeInterface Benchmarks¶
SpikeInterface ground-truth recordings with physiological ISI distributions:
| Scenario | Channels | Units | Firing Rate | Best Ratio |
|---|---|---|---|---|
| Neuropixels-like | 96 | 10 | 1-5 Hz | 457x |
| BCI-scale | 256 | 50 | 0.5-3 Hz | 756x |
| High-density | 384 | 100 | 1-10 Hz | 317x |
All above Neuralink 200x target.
Yosys Synthesis (gate counts)¶
Generic gate-level synthesis via Yosys 0.63:
| Verilog Module | Cells | Function |
|---|---|---|
sc_bitstream_encoder.v |
115 | LFSR predictor (bit-true with Python/Rust) |
sc_cordiv.v |
2 | Stochastic division |
sc_dotproduct_to_current.v |
448 | AND accumulation + popcount |
sc_aer_encoder.v |
1,423 | Priority encoder for AER |
sc_event_neuron.v |
2,135 | Event-driven LIF |
sc_lif_neuron.v |
3,134 | Q8.8 fixed-point LIF |
1024-channel codec estimate: ~406K gates, ~0.02 mm^2 at 7nm.
WaveformCodec: Raw Electrode Compression¶
End-to-end pipeline: raw 10-bit ADC -> spike detect -> template match -> compress. Measured on synthetic 1024-channel, 1 second at 20 kHz:
| Metric | Value |
|---|---|
| Raw data | 40,960,000 bytes (328 Mbit/s) |
| Compressed (q=4) | 1,703,435 bytes (13.6 Mbit/s) |
| Compression ratio | 24x |
| Spikes detected | 3,087 |
| Templates learned | 16 |
| Bluetooth capacity | 15 Mbit/s |
| Fits in uplink | YES |
Scaling (4-bit background quantization):
| Channels | Raw Mbit/s | Compressed Mbit/s | Fits BT |
|---|---|---|---|
| 128 | 26 | 1.0 | YES |
| 256 | 51 | 2.0 | YES |
| 384 | 77 | 3.0 | YES |
| 1024 | 205 | 8.0 | YES |
| 3072 | 614 | 23.9 | NO |
Competitive comparison (raw waveform compression):
| Method | Compression | Notes |
|---|---|---|
| MuSCoRE (2023) | 50-100x | Multi-scale decomposition, academic |
| CREST (2022) | 10-50x | Raw electrode, academic |
| SC-NeuroCore WaveformCodec | 24x | Spike-aware pipeline, open source |
| Delta + arithmetic (standard) | 5-15x | No spike awareness |
Notes¶
- Python benchmarks run in
--fullmode (10x iterations vs quick). - Rust benchmarks use Criterion defaults (100 samples, 3s warmup).
- v2 vs v3 comparison shows PyO3 FFI overhead for small payloads; Section 7 reports true Rust throughput without FFI.
- Brian2 installed with numpy 2.4.2 (its requirement); benchmarks run after downgrading to numpy 1.26.4 for sc-neurocore compatibility.