NeuroBench Benchmarking¶
NeuroBench-compatible standardized SNN evaluation framework.
Evidence boundary¶
Benchmark APIs produce structured measurements; they do not make claims by themselves. A performance, accuracy, power, latency, or hardware-efficiency claim is release evidence only when it points to one of these committed artefact classes:
- raw JSON or CSV under
benchmarks/results/; - a benchmark report in
docs/benchmarks/that names the command, environment, and source result file; - a companion paper artefact that carries the same command and environment provenance.
If a number is not traceable to one of those artefacts, treat it as an unpublished local measurement. Do not copy it into README, roadmap, release, or paper prose as a product claim.
Module-owned pytest throughput checks are load-tolerant smoke guards by default. They assert finite positive progress and a low non-strict floor so functional suites can run while ORCA, synthesis, or other workstation jobs are active. To enforce the historical strict numeric thresholds, run the affected tests on isolated benchmark cores with:
SC_NEUROCORE_STRICT_THROUGHPUT=1 pytest tests/test_model_fitzhugh_nagumo.py tests/test_model_ai_optimized.py
Strict throughput output is still local benchmark evidence until the raw artefact records CPU affinity, host load, governor, frequency, versions, and the command that produced it.
Metrics¶
sc_neurocore.benchmarks.metrics
¶
NeuroBench-compatible metrics: accuracy, compute complexity, spike counts.
Follows the NeuroBench algorithm track specification: - Correctness metrics: accuracy, mAP, MSE (task-specific) - Complexity metrics: synaptic operations, activation sparsity, total parameters, classification latency
Reference: NeuroBench (Nature Communications 2025)
BenchmarkResult
dataclass
¶
NeuroBench-compatible benchmark result.
Source code in src/sc_neurocore/benchmarks/metrics.py
| Python | |
|---|---|
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | |
to_neurobench_json()
¶
Export as NeuroBench-compatible JSON.
Source code in src/sc_neurocore/benchmarks/metrics.py
| Python | |
|---|---|
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
compute_metrics(predictions, targets, spike_counts=None, weights=None, timesteps=1, latency_ms=0.0, task='classification', model='sc_neurocore')
¶
Compute NeuroBench-compatible metrics from model outputs.
Parameters¶
predictions : ndarray Model predictions (class indices for classification). targets : ndarray Ground truth labels. spike_counts : ndarray, optional Per-sample total spike counts. weights : list of ndarray, optional Weight matrices for parameter counting. timesteps : int Number of simulation timesteps. latency_ms : float Inference latency in milliseconds. task : str Task name for the report. model : str Model name for the report.
Returns¶
BenchmarkResult
Source code in src/sc_neurocore/benchmarks/metrics.py
| Python | |
|---|---|
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
Tasks¶
sc_neurocore.benchmarks.tasks
¶
Built-in benchmark task definitions aligned with NeuroBench.
Each task defines: dataset, input shape, number of classes/outputs, evaluation metric, and baseline performance.
TASKS = {'keyword_spotting': BenchmarkTask(name='Keyword Spotting', description='12-class spoken keyword classification (Google Speech Commands v2)', input_shape=(16000,), n_classes=12, metric='accuracy', neurobench_id='keyword_spotting', dataset='speech_commands_v2', baseline_accuracy=0.92), 'dvs_gesture': BenchmarkTask(name='DVS Gesture Recognition', description='11-class gesture classification from DVS128 event camera', input_shape=(128, 128), n_classes=11, metric='accuracy', neurobench_id='dvs_gesture', dataset='dvs_gesture', baseline_accuracy=0.95), 'heartbeat_anomaly': BenchmarkTask(name='Heartbeat Anomaly Detection', description='Binary anomaly detection on MIT-BIH ECG dataset', input_shape=(187,), n_classes=2, metric='accuracy', neurobench_id='ecg_anomaly', dataset='mit_bih', baseline_accuracy=0.97), 'mnist': BenchmarkTask(name='MNIST Classification', description='10-class handwritten digit classification', input_shape=(784,), n_classes=10, metric='accuracy', neurobench_id='mnist', dataset='mnist', baseline_accuracy=0.99), 'shd': BenchmarkTask(name='Spiking Heidelberg Digits', description='20-class spoken digit classification (spiking audio)', input_shape=(700,), n_classes=20, metric='accuracy', neurobench_id='shd', dataset='shd', baseline_accuracy=0.85)}
module-attribute
¶
BenchmarkTask
dataclass
¶
Definition of a benchmark task.
Source code in src/sc_neurocore/benchmarks/tasks.py
| Python | |
|---|---|
20 21 22 23 24 25 26 27 28 29 30 31 | |