Skip to content

Tutorial 46: NeuroBench Benchmarking

Generate standardised evaluation reports compatible with the NeuroBench framework for fair comparison with other neuromorphic systems. Reports include accuracy, spike efficiency, parameter count, and energy estimates.

Why NeuroBench

Every framework reports different metrics in different ways. NeuroBench provides a standard: same tasks, same metrics, same reporting format. This makes cross-framework comparison honest and reproducible.

Compute Metrics

Python
import numpy as np
from sc_neurocore.benchmarks import compute_metrics

rng = np.random.default_rng(42)

result = compute_metrics(
    predictions=np.array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0]),
    targets=np.array([0, 1, 2, 0, 0, 2, 0, 1, 2, 1]),
    spike_counts=rng.integers(10, 50, size=10),
    weights=[rng.standard_normal((128, 64)).astype(np.float32)],
    timesteps=16,
    task="mnist",
)

print(result.summary())
# Task: mnist
# Accuracy: 80.0%
# Total spikes: 312
# Spikes per sample: 31.2
# Spike efficiency: 2.56% accuracy per spike
# Parameters: 8,192
# Compute ops: 131,072 (params × timesteps)
# Energy estimate: 0.013 mJ (at 45nm CMOS)

Export NeuroBench JSON

The standard NeuroBench reporting format:

Python
json_report = result.to_neurobench_json()
print(json_report)
# {
#   "task": "mnist",
#   "accuracy": 0.80,
#   "n_parameters": 8192,
#   "n_timesteps": 16,
#   "total_spikes": 312,
#   "spikes_per_sample": 31.2,
#   "energy_mj": 0.013,
#   "framework": "sc-neurocore",
#   "version": "3.14.0"
# }

# Save for submission
from pathlib import Path
Path("neurobench_result.json").write_text(json_report)

Available Tasks

NeuroBench defines standardised benchmarks:

Python
from sc_neurocore.benchmarks import TASKS

for name, task in TASKS.items():
    print(f"{name:20s} | {task.n_classes:>3d} classes | "
          f"baseline {task.baseline_accuracy:.1%} | {task.description}")
Task Classes Baseline Description
mnist 10 99.5% (ANN) Handwritten digits
shd 20 85% (SNN) Spiking Heidelberg Digits (speech)
dvs_gesture 11 95% (SNN) DVS128 Gesture recognition
heartbeat 2 90% (ANN) ECG anomaly detection
keyword 12 96% (ANN) Speech keyword spotting

Full Benchmark Pipeline

Python
from sc_neurocore.training import SpikingNet, train_epoch, evaluate, auto_device
from sc_neurocore.benchmarks import compute_metrics
from sc_neurocore.training.utils import SpikeMonitor

# 1. Train model
device = auto_device()
model = SpikingNet(n_input=784, n_hidden=128, n_output=10).to(device)
# ... training loop ...

# 2. Evaluate with spike counting
monitor = SpikeMonitor(model)
test_loss, test_acc = evaluate(model, test_loader, n_timesteps=25, device=device)
total_spikes = sum(
    monitor.get(name).sum().item() for name in monitor.layer_names
    if monitor.get(name) is not None
)

# 3. Generate NeuroBench report
result = compute_metrics(
    predictions=all_predictions,
    targets=all_targets,
    spike_counts=spike_counts_per_sample,
    weights=[p.detach().cpu().numpy() for p in model.parameters()],
    timesteps=25,
    task="mnist",
)

print(result.summary())
print(result.to_neurobench_json())

SC-NeuroCore vs NeuroBench Leaderboard

Honest comparison (measured, not estimated):

Task SC-NeuroCore Best Published Gap
MNIST 99.49% (ConvSNN) 99.72% (SEW-ResNet) -0.23%
SHD not yet measured 95.1% (SpikFormer)
DVS Gesture not yet measured 98.2% (TET)

We publish measured numbers only. SHD and DVS Gesture benchmarks are pending — we'll update this table when results are available.

References

  • Yik et al. (2024). "NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking." Nature Communications.
  • Cramer et al. (2020). "The Heidelberg Spiking Data Sets for the Systematic Evaluation of Spiking Neural Networks." IEEE TNNLS.