Skip to content

© 1998–2026 Miroslav Šotek. All rights reserved. Contact: www.anulum.li | protoscience@anulum.li ORCID: https://orcid.org/0009-0009-3560-0851 License: GNU AFFERO GENERAL PUBLIC LICENSE v3 Commercial Licensing: Available

SC-NeuroCore Discrepancy Remediation Plan

Date: 2026-02-11 Scope: Entire SC-NeuroCore benchmarking, documentation, and runtime path policy Goal: Remove ambiguity, recover small-batch performance, and publish hardware-relevant metrics.

1. Problems and Risks (No Sugar-Coating)

  1. Single-sample regression dense_fused at ~0.380 ms is slower than forward_fast at ~0.171 ms for one sample. Impact: latency-critical usage (interactive or control loops) regresses.

  2. Benchmark narrative is incomplete Claims like 512x are not fully anchored to external baselines. Impact: numbers look strong but remain challengeable.

  3. Hardware target evidence is missing Most published numbers are CPU timings. Impact: no clock-cycle evidence for FPGA/RTL deployment decisions.

  4. Onboarding communication gap Users are not clearly told when to use forward_fast vs forward_batch_numpy. Impact: wrong API path chosen, user-visible slowdown.

2. Success Criteria

  1. Publish a benchmark matrix that includes internal and external references with reproducible commands.
  2. Recover or exceed Phase 11 single-sample latency for the default "fast path" API.
  3. Publish Verilator/Icarus cycle-level metrics for dense and LIF kernels.
  4. Update user docs so path selection is explicit and testable.
  5. Gate release on benchmark and documentation checks in CI.

3. Workstreams

Workstream A: Runtime Path Policy (Latency Recovery)

  1. Add an explicit runtime selector: forward_auto(input_or_batch, policy="latency|throughput|deterministic").
  2. Default policy: latency for batch sizes <= 4, throughput for >= 10, calibration zone in between.
  3. Add calibration utility: python scripts/calibrate_dense_threshold.py to detect hardware-specific threshold.
  4. Add tests: test_dense_auto_prefers_fast_for_single_sample and test_dense_auto_prefers_batch_for_large_batches.

Exit condition: single-sample route never dispatches to slower fused/batch path by default.

Workstream B: Benchmark Baseline Integrity

  1. Define comparison sets:
  2. SC-NeuroCore v2 pure Python path.
  3. SC-NeuroCore v2 NumPy vectorized path.
  4. SC-NeuroCore v3 forward/fast/fused/batch.
  5. Optional external libraries (Norse/Sinabs/Lava CPU) with pinned versions.
  6. Use one canonical harness: python examples/03_benchmark_report.py --matrix --json.
  7. Record machine fingerprint: CPU model, SIMD tier, OS, Python version, Rust version, thread count.
  8. Export artifacts: benchmarks/results/*.json and markdown report generated from JSON only.

Exit condition: every speedup claim states exact baseline and command.

Workstream C: Hardware-Centric Validation

  1. Run RTL/co-sim with cycle counters:
  2. dense layer datapath
  3. LIF update pipeline
  4. Produce metrics:
  5. cycles/sample
  6. max throughput at target clock
  7. end-to-end latency in us at 100/200/300 MHz
  8. Add power estimate placeholder: toggle-rate based estimate + TODO hooks for vendor timing/power reports.
  9. Publish: docs/HARDWARE_BENCHMARK_REPORT.md.

Exit condition: at least one cycle-level report exists for each core kernel.

Workstream D: Documentation and Onboarding

  1. Update README.md and docs/getting-started.md with explicit path policy:
  2. forward_fast for small batches
  3. forward_batch_numpy for larger batches
  4. Add one copy-paste benchmark quickstart: python examples/03_benchmark_report.py.
  5. Add FAQ entries:
  6. "Why is fused slower on one sample?"
  7. "What does 512x compare against?"
  8. "How do I get hardware-relevant metrics?"
  9. Add a one-page index: docs/SC_NEUROCORE_PERFORMANCE_INDEX.md linking all benchmark/hardware docs.

Exit condition: a new user can choose the right path in under 2 minutes.

4. Agent Parallelization Plan

  1. Agent 1 (Runtime): auto-dispatch API, thresholds, tests.
  2. Agent 2 (Benchmark): baseline matrix harness + JSON schema + report generator.
  3. Agent 3 (Hardware): Verilator/Icarus cycle pipeline + report.
  4. Agent 4 (Docs): README, getting-started, FAQ, performance index.

All agents merge only against 03_CODE/sc-neurocore and avoid 03_CODE/SCPN-Fusion-Core.

5. Proposed Milestones

  1. M1 (Day 1): path policy + docs warning shipped.
  2. M2 (Day 2): full baseline matrix and reproducible report artifacts.
  3. M3 (Day 3): cycle-level hardware report published.
  4. M4 (Day 4): CI gating for benchmark/documentation integrity.

6. Immediate Next Actions

  1. Resolve version drift in metadata/tests (3.6.0 alignment for the Phase stream).
  2. Run Phase 8-12 test subset and benchmark smoke.
  3. Publish first discrepancy status snapshot in SESSION_LOG_*.