© 1998–2026 Miroslav Šotek. All rights reserved.
Contact: www.anulum.li | protoscience@anulum.li
ORCID: https://orcid.org/0009-0009-3560-0851
License: GNU AFFERO GENERAL PUBLIC LICENSE v3
Commercial Licensing: Available
SC-NeuroCore v3 Benchmark Report
Version: 3.13.3
Date: 2026-03-15
Previous: 3.6.0 (2026-02-10)
SIMD Tier: avx512-vpopcntdq
Baseline Definition and Routing Note
v2 in this report means the SC-NeuroCore v2 Python reference path measured by the same benchmark harness.
- External framework baselines (Norse/Sinabs/Lava CPU) are not yet included in this file and must be added before claiming ecosystem-level superiority.
- For low-latency use (single sample or micro-batch), prefer
DenseLayer.forward_fast.
- For throughput use (batch >= 10), prefer
DenseLayer.forward_batch_numpy.
Phase 12 Results (Fused Dense + Fast PRNG + Batch Forward)
Measured via examples/03_benchmark_report.py on this machine.
| Operation |
v2 (ms) |
v3 (ms) |
Speedup |
Target |
| pack (list, 1000K) |
11.538 |
35.448 |
0.3x |
6x |
| pack (numpy, 1000K) |
11.538 |
0.129 |
89.3x |
6x |
| popcount (list, 1000K) |
109.023 |
151.322 |
0.7x |
20x |
| popcount (numpy, 1000K) |
109.023 |
1.989 |
54.8x |
20x |
| dense forward (64->32, L=1024) |
3.728 |
1.598 |
2.3x |
70x |
| dense fast (64->32, L=1024) |
3.728 |
0.299 |
12.4x |
70x |
| dense prepacked (64->32, L=1024) |
3.728 |
0.282 |
13.2x |
70x |
| dense prepacked numpy (64->32, L=1024) |
3.728 |
0.110 |
33.9x |
70x |
| dense numpy (64->32, L=1024) |
3.728 |
0.647 |
5.8x |
70x |
| dense fused (64->32, L=1024) |
4.664 |
0.380 |
12.3x |
70x |
| dense batch (100x64->32, L=1024) |
289.305 |
6.893 |
42.0x |
70x |
| LIF (per-call, 100K) |
126.313 |
25.525 |
4.9x |
400x |
| LIF (batch, 100K) |
126.313 |
0.905 |
139.6x |
400x |
| LIF multi (100x100K) |
12911.296 |
25.196 |
512.4x |
400x |
Criterion Diagnosis (Phase 12)
Measured via targeted commands:
cargo bench --bench full_bench dense_forward_fused
cargo bench --bench full_bench encode_and_popcount
cargo bench --bench full_bench dense_forward_batch
cargo bench --bench full_bench prng_xoshiro
| Benchmark |
Time (95% CI) |
| dense_forward_fused_64x32 |
1.1268 ms - 1.9825 ms |
| bernoulli_encode_and_popcount_1024 |
342.59 ns - 408.10 ns |
| dense_forward_batch_64x32_x100 |
21.842 ms - 28.753 ms |
| prng_xoshiro_fill_1024 |
1.5879 us - 1.7596 us |
Interpretation:
- Fused encode+AND+popcount path is functionally correct and benchmarked end-to-end.
- Batched dense API reduces Python-level overhead substantially vs per-sample loops.
- Multi-neuron LIF remains above the Blueprint 400x target on this host.
Phase 11 Results (Reference)
| Operation |
v2 (ms) |
v3 (ms) |
Speedup |
Target |
| pack (list, 1000K) |
10.337 |
37.799 |
0.3x |
6x |
| pack (numpy, 1000K) |
10.337 |
0.069 |
149.3x |
6x |
| popcount (list, 1000K) |
96.956 |
135.444 |
0.7x |
20x |
| popcount (numpy, 1000K) |
96.956 |
1.563 |
62.0x |
20x |
| dense forward (64->32, L=1024) |
2.953 |
0.683 |
4.3x |
70x |
| dense fast (64->32, L=1024) |
2.953 |
0.171 |
17.3x |
70x |
| dense prepacked (64->32, L=1024) |
2.953 |
0.092 |
31.9x |
70x |
| dense prepacked numpy (64->32, L=1024) |
2.953 |
0.033 |
90.2x |
70x |
| dense numpy (64->32, L=1024) |
2.953 |
0.118 |
25.1x |
70x |
| LIF (per-call, 100K) |
106.451 |
23.925 |
4.4x |
400x |
| LIF (batch, 100K) |
106.451 |
0.897 |
118.7x |
400x |
| LIF multi (100x100K) |
13349.151 |
31.783 |
420.0x |
400x |
Phase 10 Results (Reference)
| Operation |
v2 (ms) |
v3 (ms) |
Speedup |
Target |
| pack (list, 1000K) |
16.918 |
45.964 |
0.4x |
6x |
| pack (numpy, 1000K) |
16.918 |
0.133 |
127.0x |
6x |
| popcount (list, 1000K) |
94.333 |
138.951 |
0.7x |
20x |
| popcount (numpy, 1000K) |
94.333 |
1.303 |
72.4x |
20x |
| dense forward (64->32, L=1024) |
7.077 |
19.442 |
0.4x |
70x |
| dense fast (64->32, L=1024) |
7.077 |
17.781 |
0.4x |
70x |
| dense prepacked (64->32, L=1024) |
7.077 |
5.453 |
1.3x |
70x |
| dense prepacked numpy (64->32, L=1024) |
7.077 |
6.125 |
1.2x |
70x |
| dense numpy (64->32, L=1024) |
7.077 |
6.727 |
1.1x |
70x |
| LIF (per-call, 100K) |
139.417 |
27.015 |
5.2x |
400x |
| LIF (batch, 100K) |
139.417 |
0.992 |
140.5x |
400x |
| LIF multi (100x100K) |
15442.319 |
90.480 |
170.7x |
400x |
Phase 9 Results (Reference)
| Operation |
v2 (ms) |
v3 (ms) |
Speedup |
Target |
| pack (list, 1000K) |
10.807 |
62.841 |
0.2x |
6x |
| pack (numpy, 1000K) |
10.807 |
9.415 |
1.1x |
6x |
| popcount (list, 1000K) |
118.885 |
144.767 |
0.8x |
20x |
| popcount (numpy, 1000K) |
118.885 |
1.866 |
63.7x |
20x |
| dense forward (64->32, L=1024) |
6.971 |
8.034 |
0.9x |
70x |
| dense fast (64->32, L=1024) |
6.971 |
6.125 |
1.1x |
70x |
| dense prepacked (64->32, L=1024) |
6.971 |
3.599 |
1.9x |
70x |
| dense prepacked numpy (64->32, L=1024) |
6.971 |
0.085 |
81.6x |
70x |
| dense numpy (64->32, L=1024) |
6.971 |
4.908 |
1.4x |
70x |
| LIF (per-call, 100K) |
143.202 |
35.008 |
4.1x |
400x |
| LIF (batch, 100K) |
143.202 |
1.404 |
102.0x |
400x |
Phase 7 Results (Reference)
| Operation |
v2 (ms) |
v3 (ms) |
Speedup |
Target |
| pack (list, 1000K) |
15.208 |
54.526 |
0.3x |
6x |
| pack (numpy, 1000K) |
15.208 |
10.315 |
1.5x |
6x |
| popcount (list, 1000K) |
108.495 |
316.783 |
0.3x |
20x |
| popcount (numpy, 1000K) |
108.495 |
1.242 |
87.4x |
20x |
| dense forward (64->32, L=1024) |
4.173 |
20.570 |
0.2x |
70x |
| dense fast (64->32, L=1024) |
4.173 |
4.318 |
1.0x |
70x |
| dense prepacked (64->32, L=1024) |
4.173 |
0.562 |
7.4x |
70x |
| LIF (per-call, 100K) |
240.266 |
61.585 |
3.9x |
400x |
| LIF (batch, 100K) |
240.266 |
1.496 |
160.6x |
400x |