© 1998–2026 Miroslav Šotek. All rights reserved. Contact: www.anulum.li | protoscience@anulum.li ORCID: https://orcid.org/0009-0009-3560-0851 License: GNU AFFERO GENERAL PUBLIC LICENSE v3 Commercial Licensing: Available
SC-NeuroCore Discrepancy Remediation Plan¶
Date: 2026-02-11 Scope: Entire SC-NeuroCore benchmarking, documentation, and runtime path policy Goal: Remove ambiguity, recover small-batch performance, and publish hardware-relevant metrics.
1. Problems and Risks (No Sugar-Coating)¶
-
Single-sample regression
dense_fusedat ~0.380 ms is slower thanforward_fastat ~0.171 ms for one sample. Impact: latency-critical usage (interactive or control loops) regresses. -
Benchmark narrative is incomplete Claims like
512xare not fully anchored to external baselines. Impact: numbers look strong but remain challengeable. -
Hardware target evidence is missing Most published numbers are CPU timings. Impact: no clock-cycle evidence for FPGA/RTL deployment decisions.
-
Onboarding communication gap Users are not clearly told when to use
forward_fastvsforward_batch_numpy. Impact: wrong API path chosen, user-visible slowdown.
2. Success Criteria¶
- Publish a benchmark matrix that includes internal and external references with reproducible commands.
- Recover or exceed Phase 11 single-sample latency for the default "fast path" API.
- Publish Verilator/Icarus cycle-level metrics for dense and LIF kernels.
- Update user docs so path selection is explicit and testable.
- Gate release on benchmark and documentation checks in CI.
3. Workstreams¶
Workstream A: Runtime Path Policy (Latency Recovery)¶
- Add an explicit runtime selector:
forward_auto(input_or_batch, policy="latency|throughput|deterministic"). - Default policy:
latencyfor batch sizes<= 4,throughputfor>= 10, calibration zone in between. - Add calibration utility:
python scripts/calibrate_dense_threshold.pyto detect hardware-specific threshold. - Add tests:
test_dense_auto_prefers_fast_for_single_sampleandtest_dense_auto_prefers_batch_for_large_batches.
Exit condition: single-sample route never dispatches to slower fused/batch path by default.
Workstream B: Benchmark Baseline Integrity¶
- Define comparison sets:
- SC-NeuroCore v2 pure Python path.
- SC-NeuroCore v2 NumPy vectorized path.
- SC-NeuroCore v3 forward/fast/fused/batch.
- Optional external libraries (Norse/Sinabs/Lava CPU) with pinned versions.
- Use one canonical harness:
python examples/03_benchmark_report.py --matrix --json. - Record machine fingerprint: CPU model, SIMD tier, OS, Python version, Rust version, thread count.
- Export artifacts:
benchmarks/results/*.jsonand markdown report generated from JSON only.
Exit condition: every speedup claim states exact baseline and command.
Workstream C: Hardware-Centric Validation¶
- Run RTL/co-sim with cycle counters:
- dense layer datapath
- LIF update pipeline
- Produce metrics:
- cycles/sample
- max throughput at target clock
- end-to-end latency in us at 100/200/300 MHz
- Add power estimate placeholder: toggle-rate based estimate + TODO hooks for vendor timing/power reports.
- Publish:
docs/HARDWARE_BENCHMARK_REPORT.md.
Exit condition: at least one cycle-level report exists for each core kernel.
Workstream D: Documentation and Onboarding¶
- Update
README.mdanddocs/getting-started.mdwith explicit path policy: forward_fastfor small batchesforward_batch_numpyfor larger batches- Add one copy-paste benchmark quickstart:
python examples/03_benchmark_report.py. - Add FAQ entries:
- "Why is fused slower on one sample?"
- "What does 512x compare against?"
- "How do I get hardware-relevant metrics?"
- Add a one-page index:
docs/SC_NEUROCORE_PERFORMANCE_INDEX.mdlinking all benchmark/hardware docs.
Exit condition: a new user can choose the right path in under 2 minutes.
4. Agent Parallelization Plan¶
- Agent 1 (Runtime): auto-dispatch API, thresholds, tests.
- Agent 2 (Benchmark): baseline matrix harness + JSON schema + report generator.
- Agent 3 (Hardware): Verilator/Icarus cycle pipeline + report.
- Agent 4 (Docs): README, getting-started, FAQ, performance index.
All agents merge only against 03_CODE/sc-neurocore and avoid 03_CODE/SCPN-Fusion-Core.
5. Proposed Milestones¶
- M1 (Day 1): path policy + docs warning shipped.
- M2 (Day 2): full baseline matrix and reproducible report artifacts.
- M3 (Day 3): cycle-level hardware report published.
- M4 (Day 4): CI gating for benchmark/documentation integrity.
6. Immediate Next Actions¶
- Resolve version drift in metadata/tests (
3.6.0alignment for the Phase stream). - Run Phase 8-12 test subset and benchmark smoke.
- Publish first discrepancy status snapshot in
SESSION_LOG_*.