Neuromorphic Datasets¶

Module: sc_neurocore.datasets Source: src/sc_neurocore/datasets/ — 3 files, 423 LOC Status (v3.14.0): all 5 public symbols wired; 23 tests pass; pure NumPy I/O — no Rust path needed for the loaders, no synaptic kinetics. The "Poisson" encoder is actually Bernoulli (§3.1, same wording issue as network/stimulus.PoissonInput).

This page covers the two encoders (poisson_encode, latency_encode) and the three event-camera / cochlear loaders (load_nmnist, load_shd, load_dvs_cifar10), each of which can fall back to synthetic data when the real archive is not on disk.

1. Public surface¶

sc_neurocore.datasets.__init__ re-exports 5 symbols:

Symbol	Source file	Role
`poisson_encode`	`encoding.py`	Per-neuron Bernoulli draw → spike train
`latency_encode`	`encoding.py`	Continuous value → first-spike-time
`load_nmnist`	`loaders.py`	N-MNIST (Orchard 2015), 34×34 DVS
`load_shd`	`loaders.py`	Spiking Heidelberg Digits (Cramer 2022), 700 channels
`load_dvs_cifar10`	`loaders.py`	DVS-CIFAR10 (Li 2017), 128×128 DVS

Each loader accepts synthetic=True to bypass disk reads — useful for unit tests and for CI where the real archives are not stored.

2. Loaders¶

2.1 `load_nmnist`¶

Python

def load_nmnist(
    root: str | Path = "data/nmnist",
    train: bool = True,
    dt_ms: float = 1.0,
    T: int = 300,
    synthetic: bool = False,
    n_samples: int = 100,
    seed: int = 42,
) -> tuple[list[np.ndarray], np.ndarray]:

Loads the N-MNIST dataset:

Orchard G., Cohen G., Jayawant A., Thakor N. "Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades." Front Neurosci 9:437 (2015).

34×34 ATIS DVS recordings of MNIST digits moved across the sensor by saccadic eye movements. 10 classes.

Returns (samples, labels): - samples: list of (N_events, 4) float32 arrays with columns [x, y, polarity, timestamp_ms] - labels: int64 array of length len(samples)

The real-data path expects the directory layout root/{Train,Test}/<class_id>/<sample>.bin. Each .bin file is a sequence of 5-byte events: [addr_high, addr_low, ts2, ts1, ts0], parsed by the helper _parse_nmnist_bin (loaders.py:150):

Bits	Meaning
0–4	x coordinate (5-bit, max 31)
5–9	y coordinate
10	polarity
16-bit `ts`	timestamp in microseconds

dt_ms scales the parsed timestamps to milliseconds via ts_us * (dt_ms / 1000.0).

2.2 `load_shd`¶

Loads the Spiking Heidelberg Digits dataset:

Cramer B., Stradmann Y., Schemmel J., Zenke F. "The Heidelberg Spiking Data Sets for the Systematic Evaluation of Spiking Neural Networks." IEEE Transactions on Neural Networks and Learning Systems 33(7):2744-2757 (2022).

20-class English/German digit utterances (0–9 in two languages) spike-encoded through Lauscher's artificial cochlea model. 700 input channels.

Returns (samples, labels): - samples: list of (T_per_sample, 700) bool arrays (binned spike rasters). T_per_sample is min(ceil(times.max() / (dt_ms/1000)) + 1, T). - labels: int64 array

The real-data path requires h5py (declared in extras) and reads root/shd_{train,test}.h5. The H5 layout is the standard SHD release: /spikes/times[i], /spikes/units[i], /labels.

2.3 `load_dvs_cifar10`¶

Loads the DVS-CIFAR10 dataset:

Li H., Liu H., Ji X., Li G., Shi L. "CIFAR10-DVS: An Event-Stream Dataset for Object Classification." Front Neurosci 11:309 (2017).

CIFAR-10 images displayed on a monitor and recorded by a 128×128 DVS camera. 10 classes.

Returns (samples, labels) in the same shape as load_nmnist. The real-data path expects .npy files (one per sample) under root/{train,test}/<class_id>/. Each .npy must be an array with columns [x, y, polarity, timestamp_ms]. Raw .aedat / .mat conversion is left to the caller.

2.4 Common contracts¶

All three loaders: - Raise FileNotFoundError with the dataset's download URL embedded in the message when root does not exist (_check_root, loaders.py:73). - Raise FileNotFoundError when root exists but the train/test subdirectory is missing. - Accept synthetic=True to bypass disk reads entirely; the synthetic path uses _synthetic_event_dataset (event-based loaders) or _synthetic_shd (binned-raster loader). Both pin their RNG to seed for reproducibility.

The synthetic generators draw class-conditional rate templates from U(0, 0.3) (event loaders) or U(0, 0.1) (SHD), then expand them through poisson_encode to per-sample spike trains. Polarities for event loaders are randint(0, 2).

3. Encoders¶

3.1 `poisson_encode` (actually Bernoulli)¶

Python

def poisson_encode(
    rates: npt.ArrayLike,
    T: int,
    dt_ms: float = 1.0,
    seed: int | None = None,
) -> np.ndarray:  # shape (T, N), bool

Returns (T, N) boolean spike train: each cell is rng.random() < min(rate * dt_ms, 1). The function name says "Poisson" but the per-step sample is Bernoulli, not a true Poisson draw. For low rate * dt_ms (< 0.1) the Bernoulli / Poisson distinction is < 5 % — the two distributions agree to first order. For high rate * dt_ms (> 0.5) Bernoulli under-counts because it cannot emit more than one spike per timestep; a true Poisson would.

Same wording issue as PoissonInput in network/stimulus.py. Either rename to bernoulli_encode or replace the < scaled line with rng.poisson(scaled, size) and accept fractional spike counts. Tracked as task #26.

3.2 `latency_encode` (first-spike-time, FIXED by task #27)¶

Python

def latency_encode(
    values: npt.ArrayLike,
    T: int,
    tau: float = 5.0,
    strict: bool = True,
) -> np.ndarray:  # shape (T, N), bool

Each value v ∈ [0, 1] produces exactly one spike at timestep int(tau * (1 - v)), clamped to [0, T-1]. Higher value → earlier spike.

Input range guard (strict=True default): the function now raises ValueError when any element of values is outside [0, 1]. The error message reports the offending min/max and suggests strict=False for the legacy silent-clip behaviour. This closes the contract gap that the original docstring claimed but did not enforce.

strict=False keeps the v3.14.0 behaviour: values=1.5 clips to spike-time 0, values=-0.5 clips toward T-1.

tau = 5.0 (default) means the latest possible spike (for v=0) is at timestep 5. For larger T, most timesteps are silent.

4. Performance — measured (this workstation)¶

Hardware: Intel i5-11600K, 32 GB DDR4, Python 3.12.3, NumPy 2.2.6.

4.1 Encoder throughput (mean of 20 calls)¶

Encoder	N	T	Per-call wall	Spike-cells/s
`poisson_encode`	100	300	0.37 ms	81.1 M
`poisson_encode`	1 000	300	3.33 ms	90.0 M
`poisson_encode`	10 000	300	37.38 ms	80.3 M
`latency_encode`	100	300	0.06 ms	—
`latency_encode`	1 000	300	0.05 ms	—
`latency_encode`	10 000	300	0.63 ms	—

poisson_encode is dominated by the rng.random((T, N)) call (uniform draw of T*N floats). Throughput is ~80 M spike-cells/s across all sizes, which matches NumPy's PRNG cost (~10 ns/element).

latency_encode is much faster because it draws no random numbers — just one fancy-indexed write per call. The (T=300, N=10000) call still runs in under 1 ms.

4.2 Synthetic loader cost¶

Loading N-MNIST in synthetic mode at T = 300, single-threaded:

`n_samples`	Wall	Total events generated
10	170.7 ms	515 780
100	1 739.8 ms	5 168 651
500	7 566.2 ms	25 894 833

Linear in n_samples: ~17 ms per sample, ~50 k events per sample. The cost is split between poisson_encode and the per-event np.column_stack + dtype cast inside _synthetic_event_dataset.

4.3 No Rust path¶

These are I/O loaders + per-call NumPy vectorised ops. The hot path (rng.random((T, N))) is already at NumPy/PCG64 speed (~80 M samples/s). A Rust port would gain little on the encoder side; the loader side is dominated by file I/O for real datasets and by NumPy allocation for synthetic data. No Rust path planned.

5. Pipeline wiring¶

Surface	How it's wired	Verifier
`from sc_neurocore.datasets import load_nmnist, ...`	`__init__.py:8-9` re-export	`tests/test_datasets.py`
Synthetic fallback path	each loader checks `synthetic` first	`TestSyntheticLoaders`
Real-data path	`_check_root` raises with download URL	`TestNMNISTRealLoader::test_load_nmnist_real_path`, etc.
`_synthetic_event_dataset` calls `poisson_encode`	`loaders.py:56`	covered transitively
H5 path imports `h5py` lazily	inside `load_shd` body	works without h5py if `synthetic=True`

No orphan helpers; _parse_nmnist_bin and _check_root are private but reachable from public loaders.

6. Audit (7-point checklist)¶

#	Dimension	Status	Detail
1	Pipeline wiring	✅ PASS	All 5 symbols wired; loaders → encoders → synthetic fallbacks
2	Multi-angle tests	✅ PASS	23 tests across 6 classes (TestCheckRoot, TestSyntheticLoaders, TestEncoding, TestNMNISTRealLoader, TestSHDRealLoader, TestDVSCIFAR10RealLoader); covers shape, reproducibility, file-not-found, real-data parse, encoder rate correlation
3	Rust path	N/A	I/O + NumPy-vectorised encoders; no compute kernel that would benefit
4	Benchmarks	✅ PASS	§4.1 + §4.2 measured this session
5	Performance docs	✅ PASS	§4
6	Documentation page	✅ PASS	This page
7	Rules followed	⚠️ WARN	SPDX header on every file ✅. `poisson_encode` is misnamed — it is Bernoulli, not Poisson (§3.1). `latency_encode` has an unenforced `[0, 1]` input contract (§3.2). British English consistent.

Net: 1 WARN, 0 FAIL. Both WARN items are naming / contract issues, not behavioural bugs. Tasks #27 and #28 track them.

7. Known issues¶

7.1 `poisson_encode` is Bernoulli (task #26)¶

For low rates this is fine; for high rates it under-counts. Either rename to bernoulli_encode (preferred — the function does not implement what the name claims) or replace the body with an actual Poisson draw and accept fractional spike counts (wider behaviour change).

7.2 `latency_encode` silently clips out-of-range input (FIXED by task #27)¶

The function now raises ValueError by default when any value is outside [0, 1]. Pass strict=False to keep the legacy silent-clip behaviour. Regression tests: tests/test_datasets.py::TestLatencyEncodeStrict (5 cases — above-1 raises, negative raises, strict=False keeps clip, boundary values 0.0 / 1.0 accepted, interior values correctly ordered).

7.3 N-MNIST `_NMNIST_RES` constant is unused on the real path¶

loaders.py:24 declares _NMNIST_RES = 34 but the real-data parser (_parse_nmnist_bin) decodes coordinates straight from the 5-bit address fields without referring to the constant. The constant is used only on the synthetic path. Either delete it from the real path, or assert that decoded coordinates fall inside [0, _NMNIST_RES).

7.4 `load_dvs_cifar10` real path requires `.npy` not raw¶

The docstring says "DVS-CIFAR10 event-camera dataset" but the loader expects pre-converted .npy files, not the raw .aedat/.mat released by Li et al. 2017. The error message at line 329-332 makes this clear, but the docstring at line 271-300 does not. Either add a one-line "Note: requires .npy-converted input" to the docstring, or ship a convert_dvs_cifar10_to_npy utility.

7.5 Synthetic SHD differs from real SHD distributionally¶

_synthetic_shd draws class templates from U(0, 0.1) independently per channel, then Poisson-encodes them. Real SHD has rich temporal structure (cochlear filter banks, formants). The synthetic data produces correct shapes and labels for unit testing but trains a classifier to chance if used for actual learning. Document this constraint in the loader docstring.

8. Tests¶

Bash

PYTHONPATH=src python3 -m pytest tests/test_datasets.py -q
# 23 passed in 10.04s (verified 2026-04-17)

Coverage breakdown:

TestCheckRoot (2): _check_root returns Path on existing dir, raises FileNotFoundError with URL on missing dir.
TestSyntheticLoaders (7): synthetic-shape correctness for all 3 loaders, missing-root paths raise even with bad inputs, reproducibility across two same-seed calls.
TestEncoding (6): poisson_encode shape + rate correlation + zero/ones edge cases; latency_encode shape + monotonic earlier-fire-for-higher-value.
TestNMNISTRealLoader (3): _parse_nmnist_bin decodes a hand-crafted 5-byte event correctly; load_nmnist real path with a synthesised directory tree; missing-split raises.
TestSHDRealLoader (2): real-path with synthesised H5 file; missing-h5 raises.
TestDVSCIFAR10RealLoader (3): real-path with synthesised npy tree; missing-split raises; empty-dir raises.

Not covered:

High-rate Poisson distinction — no test asserts that poisson_encode(rates=1.5, T=10) saturates at 1 spike/step (the Bernoulli ceiling). A test would document the §3.1 issue.
Latency input range — no test asserts behaviour for value > 1 or value < 0.
Real h5py format — TestSHDRealLoader::test_load_shd_real_path uses a synthesised H5 file; the actual SHD release format has not been smoke-tested in CI.

9. References¶

Datasets (cited by source):

Orchard G. et al. "Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades." Front Neurosci 9:437 (2015). N-MNIST.
Cramer B., Stradmann Y., Schemmel J., Zenke F. "The Heidelberg Spiking Data Sets for the Systematic Evaluation of Spiking Neural Networks." IEEE TNNLS 33(7):2744-2757 (2022). SHD.
Li H. et al. "CIFAR10-DVS: An Event-Stream Dataset for Object Classification." Front Neurosci 11:309 (2017). DVS-CIFAR10.

Encoders (background):

Gerstner W., Kistler W. M. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge UP (2002). Chapters on rate vs latency coding.
Thorpe S., Fize D., Marlot C. "Speed of processing in the human visual system." Nature 381:520-522 (1996). The original motivation for first-spike-time / latency coding.

Internal:

Network simulation engine (Poisson stimulus): api/network.md
Monitors & stimulus: api/monitor.md

10. Auto-rendered API¶

`sc_neurocore.datasets` ¶

`load_nmnist(root='data/nmnist', train=True, dt_ms=1.0, T=300, synthetic=False, n_samples=100, seed=42)` ¶

Load N-MNIST spiking vision dataset.

Neuromorphic-MNIST: 34x34 DVS recordings of MNIST digits moved on an ATIS sensor via saccadic eye movements. 10 classes.

Orchard et al., "Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades", Front. Neurosci. 2015.

Parameters¶

root : path Directory containing the extracted dataset. train : bool Load training split if True, test split otherwise. dt_ms : float Temporal resolution for synthetic fallback. T : int Number of timesteps for synthetic fallback. synthetic : bool Force synthetic data generation. n_samples : int Number of synthetic samples to generate. seed : int RNG seed for reproducible synthetic data.

Returns¶

samples : list of ndarray, each shape (N_events, 4) Columns: [x, y, polarity, timestamp_ms]. labels : ndarray of int

Source code in src/sc_neurocore/datasets/loaders.py

Python
def load_nmnist(
    root: str | Path = "data/nmnist",
    train: bool = True,
    dt_ms: float = 1.0,
    T: int = 300,
    synthetic: bool = False,
    n_samples: int = 100,
    seed: int = 42,
) -> tuple[list[np.ndarray], np.ndarray]:
    """Load N-MNIST spiking vision dataset.

    Neuromorphic-MNIST: 34x34 DVS recordings of MNIST digits moved on
    an ATIS sensor via saccadic eye movements. 10 classes.

    Orchard et al., "Converting Static Image Datasets to Spiking
    Neuromorphic Datasets Using Saccades", Front. Neurosci. 2015.

    Parameters
    ----------
    root : path
        Directory containing the extracted dataset.
    train : bool
        Load training split if True, test split otherwise.
    dt_ms : float
        Temporal resolution for synthetic fallback.
    T : int
        Number of timesteps for synthetic fallback.
    synthetic : bool
        Force synthetic data generation.
    n_samples : int
        Number of synthetic samples to generate.
    seed : int
        RNG seed for reproducible synthetic data.

    Returns
    -------
    samples : list of ndarray, each shape (N_events, 4)
        Columns: [x, y, polarity, timestamp_ms].
    labels : ndarray of int
    """
    if synthetic:
        return _synthetic_event_dataset(
            n_samples,
            _NMNIST_RES,
            10,
            T,
            dt_ms,
            seed,
        )
    _check_root(root, "N-MNIST", _NMNIST_URL)
    split_dir = Path(root) / ("Train" if train else "Test")
    if not split_dir.exists():
        raise FileNotFoundError(
            f"Expected split directory {split_dir.resolve()}. Download from {_NMNIST_URL}"
        )
    # Real loader: N-MNIST uses .bin files, one per sample, grouped by class
    samples: list[np.ndarray] = []
    label_list: list[int] = []
    for class_dir in sorted(split_dir.iterdir()):
        if not class_dir.is_dir():
            continue
        class_label = int(class_dir.name)
        for bin_file in sorted(class_dir.glob("*.bin")):
            events = _parse_nmnist_bin(bin_file, dt_ms)
            samples.append(events)
            label_list.append(class_label)
    return samples, np.array(label_list, dtype=np.int64)

`load_shd(root='data/shd', train=True, dt_ms=1.0, T=1000, synthetic=False, n_samples=100, seed=42)` ¶

Load Spiking Heidelberg Digits (SHD) dataset.

Audio digits 0-9 in English and German, spike-encoded through an artificial cochlea model. 700 input channels, 20 classes.

Cramer et al., "The Heidelberg Spiking Data Sets for the Systematic Evaluation of Spiking Neural Networks", IEEE TNNLS 2022.

Parameters¶

root : path Directory containing shd_train.h5 / shd_test.h5. train : bool Load training split if True, test split otherwise. dt_ms : float Temporal resolution for binning spikes. T : int Number of timesteps for synthetic fallback. synthetic : bool Force synthetic data generation. n_samples : int Number of synthetic samples to generate. seed : int RNG seed for reproducible synthetic data.

Returns¶

samples : list of ndarray, each shape (T, 700) dtype bool Binned spike trains. labels : ndarray of int

Source code in src/sc_neurocore/datasets/loaders.py

Python
def load_shd(
    root: str | Path = "data/shd",
    train: bool = True,
    dt_ms: float = 1.0,
    T: int = 1000,
    synthetic: bool = False,
    n_samples: int = 100,
    seed: int = 42,
) -> tuple[list[np.ndarray], np.ndarray]:
    """Load Spiking Heidelberg Digits (SHD) dataset.

    Audio digits 0-9 in English and German, spike-encoded through an
    artificial cochlea model. 700 input channels, 20 classes.

    Cramer et al., "The Heidelberg Spiking Data Sets for the Systematic
    Evaluation of Spiking Neural Networks", IEEE TNNLS 2022.

    Parameters
    ----------
    root : path
        Directory containing shd_train.h5 / shd_test.h5.
    train : bool
        Load training split if True, test split otherwise.
    dt_ms : float
        Temporal resolution for binning spikes.
    T : int
        Number of timesteps for synthetic fallback.
    synthetic : bool
        Force synthetic data generation.
    n_samples : int
        Number of synthetic samples to generate.
    seed : int
        RNG seed for reproducible synthetic data.

    Returns
    -------
    samples : list of ndarray, each shape (T, 700) dtype bool
        Binned spike trains.
    labels : ndarray of int
    """
    if synthetic:
        return _synthetic_shd(n_samples, T, dt_ms, seed)

    _check_root(root, "SHD", _SHD_URL)
    fname = "shd_train.h5" if train else "shd_test.h5"
    h5_path = Path(root) / fname
    if not h5_path.exists():
        raise FileNotFoundError(f"{h5_path.resolve()} not found. Download from {_SHD_URL}")
    import h5py

    samples: list[np.ndarray] = []
    with h5py.File(h5_path, "r") as f:
        spike_times = f["spikes"]["times"]
        spike_units = f["spikes"]["units"]
        raw_labels = f["labels"][:]
        for i in range(len(raw_labels)):
            times = np.asarray(spike_times[i])
            units = np.asarray(spike_units[i])
            if len(times) > 0:
                n_bins = min(int(np.ceil(times.max() / (dt_ms / 1000.0))) + 1, T)
            else:
                n_bins = T
            train_arr = np.zeros((n_bins, _SHD_CHANNELS), dtype=bool)
            if len(times) > 0:
                bin_idx = np.clip((times / (dt_ms / 1000.0)).astype(int), 0, n_bins - 1)
                unit_idx = np.clip(units.astype(int), 0, _SHD_CHANNELS - 1)
                train_arr[bin_idx, unit_idx] = True
            samples.append(train_arr)

    return samples, raw_labels.astype(np.int64)

`load_dvs_cifar10(root='data/dvs_cifar10', train=True, dt_ms=1.0, T=300, synthetic=False, n_samples=100, seed=42)` ¶

Load DVS-CIFAR10 event-camera dataset.

CIFAR-10 images displayed on a monitor and recorded by a DVS camera at 128x128 resolution. 10 classes.

Li et al., "CIFAR10-DVS: An Event-Stream Dataset for Object Classification", Front. Neurosci. 2017.

Parameters¶

root : path Directory containing the extracted dataset. train : bool Load training split if True, test split otherwise. dt_ms : float Temporal resolution for synthetic fallback. T : int Number of timesteps for synthetic fallback. synthetic : bool Force synthetic data generation. n_samples : int Number of synthetic samples to generate. seed : int RNG seed for reproducible synthetic data.

Returns¶

samples : list of ndarray, each shape (N_events, 4) Columns: [x, y, polarity, timestamp_ms]. labels : ndarray of int

Source code in src/sc_neurocore/datasets/loaders.py

Python
def load_dvs_cifar10(
    root: str | Path = "data/dvs_cifar10",
    train: bool = True,
    dt_ms: float = 1.0,
    T: int = 300,
    synthetic: bool = False,
    n_samples: int = 100,
    seed: int = 42,
) -> tuple[list[np.ndarray], np.ndarray]:
    """Load DVS-CIFAR10 event-camera dataset.

    CIFAR-10 images displayed on a monitor and recorded by a DVS camera
    at 128x128 resolution. 10 classes.

    Li et al., "CIFAR10-DVS: An Event-Stream Dataset for Object
    Classification", Front. Neurosci. 2017.

    Parameters
    ----------
    root : path
        Directory containing the extracted dataset.
    train : bool
        Load training split if True, test split otherwise.
    dt_ms : float
        Temporal resolution for synthetic fallback.
    T : int
        Number of timesteps for synthetic fallback.
    synthetic : bool
        Force synthetic data generation.
    n_samples : int
        Number of synthetic samples to generate.
    seed : int
        RNG seed for reproducible synthetic data.

    Returns
    -------
    samples : list of ndarray, each shape (N_events, 4)
        Columns: [x, y, polarity, timestamp_ms].
    labels : ndarray of int
    """
    if synthetic:
        return _synthetic_event_dataset(
            n_samples,
            _DVS_CIFAR10_RES,
            10,
            T,
            dt_ms,
            seed,
        )
    _check_root(root, "DVS-CIFAR10", _DVS_CIFAR10_URL)
    split_dir = Path(root) / ("train" if train else "test")
    if not split_dir.exists():
        raise FileNotFoundError(
            f"Expected split directory {split_dir.resolve()}. Download from {_DVS_CIFAR10_URL}"
        )
    # Real loader: .aedat or .mat files grouped by class
    samples: list[np.ndarray] = []
    label_list: list[int] = []
    for class_dir in sorted(split_dir.iterdir()):
        if not class_dir.is_dir():
            continue
        class_label = int(class_dir.name)
        for event_file in sorted(class_dir.glob("*.npy")):
            events = np.load(event_file).astype(np.float32)
            samples.append(events)
            label_list.append(class_label)
    if not samples:
        raise FileNotFoundError(
            f"No .npy event files found in {split_dir.resolve()}. "
            f"Convert raw data to .npy arrays with columns [x, y, pol, ts_ms]."
        )
    return samples, np.array(label_list, dtype=np.int64)

`poisson_encode(rates, T, dt_ms=1.0, seed=None)` ¶

Convert firing-rate array to Poisson spike trains.

Parameters¶

rates : array_like, shape (N,) Firing probabilities per timestep, clipped to [0, 1]. T : int Number of timesteps. dt_ms : float Timestep duration in ms (scales rates linearly). seed : int or None RNG seed for reproducibility.

Returns¶

spikes : ndarray, shape (T, N), dtype bool

Source code in src/sc_neurocore/datasets/encoding.py

Python
def poisson_encode(
    rates: npt.ArrayLike,
    T: int,
    dt_ms: float = 1.0,
    seed: int | None = None,
) -> np.ndarray:
    """Convert firing-rate array to Poisson spike trains.

    Parameters
    ----------
    rates : array_like, shape (N,)
        Firing probabilities per timestep, clipped to [0, 1].
    T : int
        Number of timesteps.
    dt_ms : float
        Timestep duration in ms (scales rates linearly).
    seed : int or None
        RNG seed for reproducibility.

    Returns
    -------
    spikes : ndarray, shape (T, N), dtype bool
    """
    rng = np.random.default_rng(seed)
    rates = np.asarray(rates, dtype=np.float64)
    scaled = np.clip(rates * (dt_ms / 1.0), 0.0, 1.0)
    return rng.random((T, rates.shape[0])) < scaled

`latency_encode(values, T, tau=5.0, strict=True)` ¶

Convert normalised values in [0, 1] to first-spike-time trains.

Higher values spike earlier. Each neuron fires exactly once.

Parameters¶

values : array_like, shape (N,) Input values, expected in [0, 1]. T : int Number of timesteps. tau : float Time constant controlling the spike-time spread. strict : bool If True (default), raise ValueError when any value lies outside [0, 1]. If False, silently clip the resulting spike times to [0, T-1] (the legacy behaviour). The clip happens regardless of strict; this flag controls only whether the function raises before clipping.

Returns¶

spikes : ndarray, shape (T, N), dtype bool

Raises¶

ValueError If strict=True (default) and any element of values is outside [0, 1].

Source code in src/sc_neurocore/datasets/encoding.py

Python
def latency_encode(
    values: npt.ArrayLike,
    T: int,
    tau: float = 5.0,
    strict: bool = True,
) -> np.ndarray:
    """Convert normalised values in [0, 1] to first-spike-time trains.

    Higher values spike earlier. Each neuron fires exactly once.

    Parameters
    ----------
    values : array_like, shape (N,)
        Input values, expected in ``[0, 1]``.
    T : int
        Number of timesteps.
    tau : float
        Time constant controlling the spike-time spread.
    strict : bool
        If True (default), raise ``ValueError`` when any value lies
        outside ``[0, 1]``. If False, silently clip the resulting
        spike times to ``[0, T-1]`` (the legacy behaviour). The
        clip happens regardless of ``strict``; this flag controls
        only whether the function raises before clipping.

    Returns
    -------
    spikes : ndarray, shape (T, N), dtype bool

    Raises
    ------
    ValueError
        If ``strict=True`` (default) and any element of ``values``
        is outside ``[0, 1]``.
    """
    values = np.asarray(values, dtype=np.float64)
    if strict and (values.min() < 0.0 or values.max() > 1.0):
        bad_min = float(values.min())
        bad_max = float(values.max())
        raise ValueError(
            f"latency_encode: values must be in [0, 1] when strict=True; "
            f"got min={bad_min}, max={bad_max}. Pass strict=False to "
            f"accept the legacy silent-clip behaviour."
        )
    # spike_time = tau * (1 - value); higher value => earlier spike
    spike_times = np.clip(tau * (1.0 - values), 0, T - 1).astype(int)
    spikes = np.zeros((T, values.shape[0]), dtype=bool)
    neuron_idx = np.arange(values.shape[0])
    spikes[spike_times, neuron_idx] = True
    return spikes

Neuromorphic Datasets¶

1. Public surface¶

2. Loaders¶

2.1 load_nmnist¶

2.2 load_shd¶

2.3 load_dvs_cifar10¶

2.4 Common contracts¶

3. Encoders¶

3.1 poisson_encode (actually Bernoulli)¶

3.2 latency_encode (first-spike-time, FIXED by task #27)¶

4. Performance — measured (this workstation)¶

4.1 Encoder throughput (mean of 20 calls)¶

4.2 Synthetic loader cost¶

4.3 No Rust path¶

5. Pipeline wiring¶

6. Audit (7-point checklist)¶

7. Known issues¶

7.1 poisson_encode is Bernoulli (task #26)¶

7.2 latency_encode silently clips out-of-range input (FIXED by task #27)¶

7.3 N-MNIST _NMNIST_RES constant is unused on the real path¶

7.4 load_dvs_cifar10 real path requires .npy not raw¶

7.5 Synthetic SHD differs from real SHD distributionally¶

8. Tests¶

9. References¶

10. Auto-rendered API¶

sc_neurocore.datasets ¶

load_nmnist(root='data/nmnist', train=True, dt_ms=1.0, T=300, synthetic=False, n_samples=100, seed=42) ¶

Parameters¶

Returns¶

load_shd(root='data/shd', train=True, dt_ms=1.0, T=1000, synthetic=False, n_samples=100, seed=42) ¶

Parameters¶

Returns¶

load_dvs_cifar10(root='data/dvs_cifar10', train=True, dt_ms=1.0, T=300, synthetic=False, n_samples=100, seed=42) ¶

Parameters¶

Returns¶

poisson_encode(rates, T, dt_ms=1.0, seed=None) ¶

Parameters¶

Returns¶

latency_encode(values, T, tau=5.0, strict=True) ¶

Parameters¶

Returns¶

Raises¶

2.1 `load_nmnist`¶

2.2 `load_shd`¶

2.3 `load_dvs_cifar10`¶

3.1 `poisson_encode` (actually Bernoulli)¶

3.2 `latency_encode` (first-spike-time, FIXED by task #27)¶

7.1 `poisson_encode` is Bernoulli (task #26)¶

7.2 `latency_encode` silently clips out-of-range input (FIXED by task #27)¶

7.3 N-MNIST `_NMNIST_RES` constant is unused on the real path¶

7.4 `load_dvs_cifar10` real path requires `.npy` not raw¶

`sc_neurocore.datasets` ¶

`load_nmnist(root='data/nmnist', train=True, dt_ms=1.0, T=300, synthetic=False, n_samples=100, seed=42)` ¶

`load_shd(root='data/shd', train=True, dt_ms=1.0, T=1000, synthetic=False, n_samples=100, seed=42)` ¶

`load_dvs_cifar10(root='data/dvs_cifar10', train=True, dt_ms=1.0, T=300, synthetic=False, n_samples=100, seed=42)` ¶

`poisson_encode(rates, T, dt_ms=1.0, seed=None)` ¶

`latency_encode(values, T, tau=5.0, strict=True)` ¶