Tutorial 36: MPI Distributed Simulation¶

SC-NeuroCore supports billion-neuron simulations via MPI (Message Passing Interface). Populations are distributed across processes, projections handle inter-process spike exchange, and the Network engine transparently manages communication.

Prerequisites¶

pip install mpi4py
# On Ubuntu/Debian: sudo apt install libopenmpi-dev
# On macOS: brew install open-mpi

1. Basic MPI Run¶

The same Network API works — just set backend="mpi":

# mpi_demo.py
from sc_neurocore.neurons.models.hodgkin_huxley import HodgkinHuxleyNeuron
from sc_neurocore.network.population import Population
from sc_neurocore.network.projection import Projection
from sc_neurocore.network.network import Network
from sc_neurocore.network.monitor import SpikeMonitor
from sc_neurocore.network.stimulus import PoissonInput

exc = Population(HodgkinHuxleyNeuron, n=10000, label="exc")
inh = Population(HodgkinHuxleyNeuron, n=2500, label="inh")

exc_exc = Projection(exc, exc, weight=0.02, topology="random", probability=0.02)
exc_inh = Projection(exc, inh, weight=0.05, topology="random", probability=0.05)
inh_exc = Projection(inh, exc, weight=-0.1, topology="random", probability=0.05)

drive = PoissonInput(n=10000, rate_hz=50.0, weight=1.0, dt=0.001)
mon = SpikeMonitor(exc)

net = Network(exc, inh, exc_exc, exc_inh, inh_exc, drive, mon)
net.run(duration=1.0, dt=0.001, backend="mpi")

from mpi4py import MPI
if MPI.COMM_WORLD.Get_rank() == 0:
    print(f"Total spikes: {mon.count}")

Launch with MPI:

mpirun -np 4 python mpi_demo.py

2. How Distribution Works¶

The MPI runner partitions populations across ranks:

12,500 neurons on 4 ranks	Rank 0	Rank 1	Rank 2	Rank 3
Excitatory (10,000)	2,500	2,500	2,500	2,500
Inhibitory (2,500)	625	625	625	625

Each rank: 1. Steps its local neurons 2. Gathers local spikes 3. Allgather spikes across all ranks (MPI_Allgather) 4. Applies incoming spikes via projections 5. Records local spikes

Communication cost is proportional to the number of spikes per timestep, not the number of neurons.

3. Scaling¶

Neurons	Ranks	Time (1s sim)	Speedup
10K	1	~60s	1.0x
10K	4	~18s	3.3x
100K	16	~45s	—
1M	64	~120s	—

Near-linear scaling up to the communication-bound regime (~100K spikes/step).

4. HPC Job Script¶

Example Slurm submission script for a cluster:

#!/bin/bash
#SBATCH --job-name=sc-neurocore-mpi
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=16
#SBATCH --time=01:00:00

module load python/3.12 openmpi
source venv/bin/activate

mpirun -np 64 python large_simulation.py

5. Combining MPI with Rust¶

For maximum performance, use MPI for distribution and Rust for per-rank computation:

# Each MPI rank uses the Rust NetworkRunner for its local partition
net.run(duration=1.0, dt=0.001, backend="mpi")
# The MPI runner automatically delegates local stepping to Rust
# if sc_neurocore_engine is available

Limitations¶

All ranks must have the same Python environment and sc-neurocore version
Projections with plasticity (STDP) across ranks require additional synchronization
The allgather step becomes the bottleneck above ~100K spikes per timestep
Not all neuron models support MPI (check _rust_supports_model())