NeuroBench Benchmarking¶

NeuroBench-compatible standardized SNN evaluation framework.

Metrics¶

`sc_neurocore.benchmarks.metrics` ¶

NeuroBench-compatible metrics: accuracy, compute complexity, spike counts.

Follows the NeuroBench algorithm track specification: - Correctness metrics: accuracy, mAP, MSE (task-specific) - Complexity metrics: synaptic operations, activation sparsity, total parameters, classification latency

Reference: NeuroBench (Nature Communications 2025)

`BenchmarkResult` `dataclass` ¶

NeuroBench-compatible benchmark result.

Source code in src/sc_neurocore/benchmarks/metrics.py

@dataclass
class BenchmarkResult:
    """NeuroBench-compatible benchmark result."""

    task: str
    model: str
    accuracy: float
    total_parameters: int
    synaptic_operations: int
    activation_sparsity: float
    total_spikes: int
    timesteps: int
    latency_ms: float
    energy_nj: float = 0.0
    extra: dict = field(default_factory=dict)

    def to_neurobench_json(self) -> str:
        """Export as NeuroBench-compatible JSON."""
        result = {
            "task": self.task,
            "model": self.model,
            "metrics": {
                "correctness": {
                    "accuracy": self.accuracy,
                },
                "complexity": {
                    "total_parameters": self.total_parameters,
                    "synaptic_operations": self.synaptic_operations,
                    "activation_sparsity": self.activation_sparsity,
                    "total_spikes": self.total_spikes,
                    "timesteps": self.timesteps,
                },
                "system": {
                    "latency_ms": self.latency_ms,
                    "energy_nj": self.energy_nj,
                },
            },
            "framework": "sc-neurocore",
        }
        result["metrics"].update(self.extra)
        return json.dumps(result, indent=2)

    def summary(self) -> str:
        lines = [
            f"NeuroBench Result: {self.task} / {self.model}",
            f"  Accuracy:          {self.accuracy:.4f}",
            f"  Parameters:        {self.total_parameters:,}",
            f"  Synaptic ops:      {self.synaptic_operations:,}",
            f"  Sparsity:          {self.activation_sparsity:.2%}",
            f"  Total spikes:      {self.total_spikes:,}",
            f"  Timesteps:         {self.timesteps}",
            f"  Latency:           {self.latency_ms:.2f} ms",
        ]
        if self.energy_nj > 0:
            lines.append(f"  Energy:            {self.energy_nj:.2f} nJ")
        return "\n".join(lines)

`to_neurobench_json()` ¶

Export as NeuroBench-compatible JSON.

Source code in src/sc_neurocore/benchmarks/metrics.py

def to_neurobench_json(self) -> str:
    """Export as NeuroBench-compatible JSON."""
    result = {
        "task": self.task,
        "model": self.model,
        "metrics": {
            "correctness": {
                "accuracy": self.accuracy,
            },
            "complexity": {
                "total_parameters": self.total_parameters,
                "synaptic_operations": self.synaptic_operations,
                "activation_sparsity": self.activation_sparsity,
                "total_spikes": self.total_spikes,
                "timesteps": self.timesteps,
            },
            "system": {
                "latency_ms": self.latency_ms,
                "energy_nj": self.energy_nj,
            },
        },
        "framework": "sc-neurocore",
    }
    result["metrics"].update(self.extra)
    return json.dumps(result, indent=2)

`compute_metrics(predictions, targets, spike_counts=None, weights=None, timesteps=1, latency_ms=0.0, task='classification', model='sc_neurocore')` ¶

Compute NeuroBench-compatible metrics from model outputs.

Parameters¶

predictions : ndarray Model predictions (class indices for classification). targets : ndarray Ground truth labels. spike_counts : ndarray, optional Per-sample total spike counts. weights : list of ndarray, optional Weight matrices for parameter counting. timesteps : int Number of simulation timesteps. latency_ms : float Inference latency in milliseconds. task : str Task name for the report. model : str Model name for the report.

Returns¶

BenchmarkResult

Source code in src/sc_neurocore/benchmarks/metrics.py

def compute_metrics(
    predictions: np.ndarray,
    targets: np.ndarray,
    spike_counts: np.ndarray | None = None,
    weights: list[np.ndarray] | None = None,
    timesteps: int = 1,
    latency_ms: float = 0.0,
    task: str = "classification",
    model: str = "sc_neurocore",
) -> BenchmarkResult:
    """Compute NeuroBench-compatible metrics from model outputs.

    Parameters
    ----------
    predictions : ndarray
        Model predictions (class indices for classification).
    targets : ndarray
        Ground truth labels.
    spike_counts : ndarray, optional
        Per-sample total spike counts.
    weights : list of ndarray, optional
        Weight matrices for parameter counting.
    timesteps : int
        Number of simulation timesteps.
    latency_ms : float
        Inference latency in milliseconds.
    task : str
        Task name for the report.
    model : str
        Model name for the report.

    Returns
    -------
    BenchmarkResult
    """
    accuracy = float(np.mean(predictions == targets))

    total_params = sum(w.size for w in weights) if weights else 0

    if spike_counts is not None:
        total_spikes = int(spike_counts.sum())
        n_samples = len(predictions)
        sparsity = 1.0 - (total_spikes / max(total_params * timesteps * n_samples, 1))
    else:
        total_spikes = 0
        sparsity = 0.0

    # Synaptic operations: each spike activates fan-out synapses
    syn_ops = total_spikes * (total_params // max(timesteps, 1)) if weights else 0

    return BenchmarkResult(
        task=task,
        model=model,
        accuracy=accuracy,
        total_parameters=total_params,
        synaptic_operations=syn_ops,
        activation_sparsity=max(0.0, min(1.0, sparsity)),
        total_spikes=total_spikes,
        timesteps=timesteps,
        latency_ms=latency_ms,
    )

Tasks¶

`sc_neurocore.benchmarks.tasks` ¶

Built-in benchmark task definitions aligned with NeuroBench.

Each task defines: dataset, input shape, number of classes/outputs, evaluation metric, and baseline performance.

TASKS = {'keyword_spotting': BenchmarkTask(name='Keyword Spotting', description='12-class spoken keyword classification (Google Speech Commands v2)', input_shape=(16000,), n_classes=12, metric='accuracy', neurobench_id='keyword_spotting', dataset='speech_commands_v2', baseline_accuracy=0.92), 'dvs_gesture': BenchmarkTask(name='DVS Gesture Recognition', description='11-class gesture classification from DVS128 event camera', input_shape=(128, 128), n_classes=11, metric='accuracy', neurobench_id='dvs_gesture', dataset='dvs_gesture', baseline_accuracy=0.95), 'heartbeat_anomaly': BenchmarkTask(name='Heartbeat Anomaly Detection', description='Binary anomaly detection on MIT-BIH ECG dataset', input_shape=(187,), n_classes=2, metric='accuracy', neurobench_id='ecg_anomaly', dataset='mit_bih', baseline_accuracy=0.97), 'mnist': BenchmarkTask(name='MNIST Classification', description='10-class handwritten digit classification', input_shape=(784,), n_classes=10, metric='accuracy', neurobench_id='mnist', dataset='mnist', baseline_accuracy=0.99), 'shd': BenchmarkTask(name='Spiking Heidelberg Digits', description='20-class spoken digit classification (spiking audio)', input_shape=(700,), n_classes=20, metric='accuracy', neurobench_id='shd', dataset='shd', baseline_accuracy=0.85)} `module-attribute` ¶

`BenchmarkTask` `dataclass` ¶

Definition of a benchmark task.

Source code in src/sc_neurocore/benchmarks/tasks.py

@dataclass(frozen=True)
class BenchmarkTask:
    """Definition of a benchmark task."""

    name: str
    description: str
    input_shape: tuple[int, ...]
    n_classes: int
    metric: str
    neurobench_id: str
    dataset: str
    baseline_accuracy: float

NeuroBench Benchmarking¶

Metrics¶

sc_neurocore.benchmarks.metrics ¶

BenchmarkResult dataclass ¶

to_neurobench_json() ¶

compute_metrics(predictions, targets, spike_counts=None, weights=None, timesteps=1, latency_ms=0.0, task='classification', model='sc_neurocore') ¶

Parameters¶

Returns¶

Tasks¶

sc_neurocore.benchmarks.tasks ¶

BenchmarkTask dataclass ¶

`sc_neurocore.benchmarks.metrics` ¶

`BenchmarkResult` `dataclass` ¶

`to_neurobench_json()` ¶

`compute_metrics(predictions, targets, spike_counts=None, weights=None, timesteps=1, latency_ms=0.0, task='classification', model='sc_neurocore')` ¶

`sc_neurocore.benchmarks.tasks` ¶

`BenchmarkTask` `dataclass` ¶