Skip to content

© 1998–2026 Miroslav Šotek. All rights reserved. Contact: www.anulum.li | protoscience@anulum.li ORCID: https://orcid.org/0009-0009-3560-0851 License: GNU AFFERO GENERAL PUBLIC LICENSE v3 Commercial Licensing: Available

SC-NeuroCore v3 Benchmark Report

Version: 3.13.3 Date: 2026-03-15 Previous: 3.6.0 (2026-02-10) SIMD Tier: avx512-vpopcntdq

Baseline Definition and Routing Note

  • v2 in this report means the SC-NeuroCore v2 Python reference path measured by the same benchmark harness.
  • External framework baselines (Norse/Sinabs/Lava CPU) are not yet included in this file and must be added before claiming ecosystem-level superiority.
  • For low-latency use (single sample or micro-batch), prefer DenseLayer.forward_fast.
  • For throughput use (batch >= 10), prefer DenseLayer.forward_batch_numpy.

Phase 12 Results (Fused Dense + Fast PRNG + Batch Forward)

Measured via examples/03_benchmark_report.py on this machine.

Operation v2 (ms) v3 (ms) Speedup Target
pack (list, 1000K) 11.538 35.448 0.3x 6x
pack (numpy, 1000K) 11.538 0.129 89.3x 6x
popcount (list, 1000K) 109.023 151.322 0.7x 20x
popcount (numpy, 1000K) 109.023 1.989 54.8x 20x
dense forward (64->32, L=1024) 3.728 1.598 2.3x 70x
dense fast (64->32, L=1024) 3.728 0.299 12.4x 70x
dense prepacked (64->32, L=1024) 3.728 0.282 13.2x 70x
dense prepacked numpy (64->32, L=1024) 3.728 0.110 33.9x 70x
dense numpy (64->32, L=1024) 3.728 0.647 5.8x 70x
dense fused (64->32, L=1024) 4.664 0.380 12.3x 70x
dense batch (100x64->32, L=1024) 289.305 6.893 42.0x 70x
LIF (per-call, 100K) 126.313 25.525 4.9x 400x
LIF (batch, 100K) 126.313 0.905 139.6x 400x
LIF multi (100x100K) 12911.296 25.196 512.4x 400x

Criterion Diagnosis (Phase 12)

Measured via targeted commands:

cargo bench --bench full_bench dense_forward_fused
cargo bench --bench full_bench encode_and_popcount
cargo bench --bench full_bench dense_forward_batch
cargo bench --bench full_bench prng_xoshiro
Benchmark Time (95% CI)
dense_forward_fused_64x32 1.1268 ms - 1.9825 ms
bernoulli_encode_and_popcount_1024 342.59 ns - 408.10 ns
dense_forward_batch_64x32_x100 21.842 ms - 28.753 ms
prng_xoshiro_fill_1024 1.5879 us - 1.7596 us

Interpretation: - Fused encode+AND+popcount path is functionally correct and benchmarked end-to-end. - Batched dense API reduces Python-level overhead substantially vs per-sample loops. - Multi-neuron LIF remains above the Blueprint 400x target on this host.

Phase 11 Results (Reference)

Operation v2 (ms) v3 (ms) Speedup Target
pack (list, 1000K) 10.337 37.799 0.3x 6x
pack (numpy, 1000K) 10.337 0.069 149.3x 6x
popcount (list, 1000K) 96.956 135.444 0.7x 20x
popcount (numpy, 1000K) 96.956 1.563 62.0x 20x
dense forward (64->32, L=1024) 2.953 0.683 4.3x 70x
dense fast (64->32, L=1024) 2.953 0.171 17.3x 70x
dense prepacked (64->32, L=1024) 2.953 0.092 31.9x 70x
dense prepacked numpy (64->32, L=1024) 2.953 0.033 90.2x 70x
dense numpy (64->32, L=1024) 2.953 0.118 25.1x 70x
LIF (per-call, 100K) 106.451 23.925 4.4x 400x
LIF (batch, 100K) 106.451 0.897 118.7x 400x
LIF multi (100x100K) 13349.151 31.783 420.0x 400x

Phase 10 Results (Reference)

Operation v2 (ms) v3 (ms) Speedup Target
pack (list, 1000K) 16.918 45.964 0.4x 6x
pack (numpy, 1000K) 16.918 0.133 127.0x 6x
popcount (list, 1000K) 94.333 138.951 0.7x 20x
popcount (numpy, 1000K) 94.333 1.303 72.4x 20x
dense forward (64->32, L=1024) 7.077 19.442 0.4x 70x
dense fast (64->32, L=1024) 7.077 17.781 0.4x 70x
dense prepacked (64->32, L=1024) 7.077 5.453 1.3x 70x
dense prepacked numpy (64->32, L=1024) 7.077 6.125 1.2x 70x
dense numpy (64->32, L=1024) 7.077 6.727 1.1x 70x
LIF (per-call, 100K) 139.417 27.015 5.2x 400x
LIF (batch, 100K) 139.417 0.992 140.5x 400x
LIF multi (100x100K) 15442.319 90.480 170.7x 400x

Phase 9 Results (Reference)

Operation v2 (ms) v3 (ms) Speedup Target
pack (list, 1000K) 10.807 62.841 0.2x 6x
pack (numpy, 1000K) 10.807 9.415 1.1x 6x
popcount (list, 1000K) 118.885 144.767 0.8x 20x
popcount (numpy, 1000K) 118.885 1.866 63.7x 20x
dense forward (64->32, L=1024) 6.971 8.034 0.9x 70x
dense fast (64->32, L=1024) 6.971 6.125 1.1x 70x
dense prepacked (64->32, L=1024) 6.971 3.599 1.9x 70x
dense prepacked numpy (64->32, L=1024) 6.971 0.085 81.6x 70x
dense numpy (64->32, L=1024) 6.971 4.908 1.4x 70x
LIF (per-call, 100K) 143.202 35.008 4.1x 400x
LIF (batch, 100K) 143.202 1.404 102.0x 400x

Phase 7 Results (Reference)

Operation v2 (ms) v3 (ms) Speedup Target
pack (list, 1000K) 15.208 54.526 0.3x 6x
pack (numpy, 1000K) 15.208 10.315 1.5x 6x
popcount (list, 1000K) 108.495 316.783 0.3x 20x
popcount (numpy, 1000K) 108.495 1.242 87.4x 20x
dense forward (64->32, L=1024) 4.173 20.570 0.2x 70x
dense fast (64->32, L=1024) 4.173 4.318 1.0x 70x
dense prepacked (64->32, L=1024) 4.173 0.562 7.4x 70x
LIF (per-call, 100K) 240.266 61.585 3.9x 400x
LIF (batch, 100K) 240.266 1.496 160.6x 400x