Tutorial 46: NeuroBench Benchmarking¶
Generate standardised evaluation reports compatible with the NeuroBench framework for fair comparison with other neuromorphic systems. Reports include accuracy, spike efficiency, parameter count, and energy estimates.
Why NeuroBench¶
Every framework reports different metrics in different ways. NeuroBench provides a standard: same tasks, same metrics, same reporting format. This makes cross-framework comparison honest and reproducible.
Compute Metrics¶
import numpy as np
from sc_neurocore.benchmarks import compute_metrics
rng = np.random.default_rng(42)
result = compute_metrics(
predictions=np.array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0]),
targets=np.array([0, 1, 2, 0, 0, 2, 0, 1, 2, 1]),
spike_counts=rng.integers(10, 50, size=10),
weights=[rng.standard_normal((128, 64)).astype(np.float32)],
timesteps=16,
task="mnist",
)
print(result.summary())
# Task: mnist
# Accuracy: 80.0%
# Total spikes: 312
# Spikes per sample: 31.2
# Spike efficiency: 2.56% accuracy per spike
# Parameters: 8,192
# Compute ops: 131,072 (params × timesteps)
# Energy estimate: 0.013 mJ (at 45nm CMOS)
Export NeuroBench JSON¶
The standard NeuroBench reporting format:
json_report = result.to_neurobench_json()
print(json_report)
# {
# "task": "mnist",
# "accuracy": 0.80,
# "n_parameters": 8192,
# "n_timesteps": 16,
# "total_spikes": 312,
# "spikes_per_sample": 31.2,
# "energy_mj": 0.013,
# "framework": "sc-neurocore",
# "version": "3.14.0"
# }
# Save for submission
from pathlib import Path
Path("neurobench_result.json").write_text(json_report)
Available Tasks¶
NeuroBench defines standardised benchmarks:
from sc_neurocore.benchmarks import TASKS
for name, task in TASKS.items():
print(f"{name:20s} | {task.n_classes:>3d} classes | "
f"baseline {task.baseline_accuracy:.1%} | {task.description}")
| Task | Classes | Baseline | Description |
|---|---|---|---|
| mnist | 10 | 99.5% (ANN) | Handwritten digits |
| shd | 20 | 85% (SNN) | Spiking Heidelberg Digits (speech) |
| dvs_gesture | 11 | 95% (SNN) | DVS128 Gesture recognition |
| heartbeat | 2 | 90% (ANN) | ECG anomaly detection |
| keyword | 12 | 96% (ANN) | Speech keyword spotting |
Full Benchmark Pipeline¶
from sc_neurocore.training import SpikingNet, train_epoch, evaluate, auto_device
from sc_neurocore.benchmarks import compute_metrics
from sc_neurocore.training.utils import SpikeMonitor
# 1. Train model
device = auto_device()
model = SpikingNet(n_input=784, n_hidden=128, n_output=10).to(device)
# ... training loop ...
# 2. Evaluate with spike counting
monitor = SpikeMonitor(model)
test_loss, test_acc = evaluate(model, test_loader, n_timesteps=25, device=device)
total_spikes = sum(
monitor.get(name).sum().item() for name in monitor.layer_names
if monitor.get(name) is not None
)
# 3. Generate NeuroBench report
result = compute_metrics(
predictions=all_predictions,
targets=all_targets,
spike_counts=spike_counts_per_sample,
weights=[p.detach().cpu().numpy() for p in model.parameters()],
timesteps=25,
task="mnist",
)
print(result.summary())
print(result.to_neurobench_json())
SC-NeuroCore vs NeuroBench Leaderboard¶
Honest comparison (measured, not estimated):
| Task | SC-NeuroCore | Best Published | Gap |
|---|---|---|---|
| MNIST | 99.49% (ConvSNN; benchmarks/results/mnist_conv_accuracy_reproducibility.json) |
99.72% (SEW-ResNet) | -0.23% |
| SHD | 79.28% (feedforward SpikingNet, Kaggle CPU, 2026-03-28) | 95.1% (SpikFormer) | -15.82% |
| DVS Gesture | no committed current-checkout measurement | 98.2% (TET) | — |
We publish measured numbers only. The SHD number is documented in
validation/neurobench_shd.md and backed by
benchmarks/results/neurobench_shd_results.json. DVS Gesture remains outside
the committed benchmark evidence set and should not be used in public
performance claims until a reproducible artifact lands.
References¶
- Yik et al. (2024). "NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking." Nature Communications.
- Cramer et al. (2020). "The Heidelberg Spiking Data Sets for the Systematic Evaluation of Spiking Neural Networks." IEEE TNNLS.