Expand description
Per-row-parallel CSR sparse matrix-vector add for the Potjans
CorticalColumn block-CSR injection path.
parallel_csr_spmv_add(indptr, indices, data, x, y) computes
y += W @ x where W is a CSR matrix described by (indptr, indices, data). Rows are processed in parallel via rayon.
This is the kernel that lets CorticalColumn use the per-(source-
type, global-bin) block matrices at scale ≥ 0.5: a single block
mat-vec at scale=0.1 is ≈ 18 ms scipy-single-threaded; with rayon
over 8 cores it is ≈ 2-3 ms. At scale=0.5 the savings extrapolate
linearly with nnz, bringing 600 ms simulation wall-time from
~50 minutes (single-threaded scipy block) into the
~10-minute range and unlocking the full-scale (~77 000-cell)
convergence regime documented by van Albada et al. 2015 Fig 5.
Determinism: per-row reductions are LOCAL to each row, so the parallel order does not affect the result. Bit-identical to the scipy single-threaded reference for matching inputs.
Constants§
- CHUNK_
SIZE 🔒 y[r] += sum_k data[k] * x[indices[k]]fork in indptr[r]..indptr[r+1], processing rows in chunks in parallel via rayon.
Functions§
- parallel_
csr_ multi_ spmv_ add - Batched per-row-parallel CSR spmv add:
y += sum_b W_b @ x_bacrossn_blocks(matrix, vector) pairs, all sharing the same row dimension. Used byCorticalColumn._inject_block(dt)to do2 × n_delay_bins(= 10) spmv calls in one FFI call instead of 10 separate FFI calls per step. The per-row reduction is local so chunking still parallelises cleanly. - parallel_
csr_ spmv_ add