Acceleration¶
Backend modules for high-performance SC operations.
| Module | Purpose |
|---|---|
vector_ops |
Packed uint64 bitwise AND, popcount, pack/unpack |
gpu_backend |
CuPy GPU dispatch (transparent NumPy fallback) |
jax_backend |
JAX JIT-compiled LIF step for TPU/GPU scaling |
jit_kernels |
Numba-accelerated inner loops |
mpi_driver |
MPI-based distributed simulation |
Vector Operations¶
sc_neurocore.accel.vector_ops
¶
pack_bitstream(bitstream)
¶
Packs a uint8 bitstream (0s and 1s) into uint64 integers. This allows processing 64 time steps in parallel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bitstream
|
ndarray[Any, Any]
|
Shape (N,) or (Batch, N) of uint8 {0,1} |
required |
Returns:
| Name | Type | Description |
|---|---|---|
packed |
ndarray[Any, Any]
|
Shape (ceil(N/64),) or (Batch, ceil(N/64)) of uint64 |
Source code in src/sc_neurocore/accel/vector_ops.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
unpack_bitstream(packed, original_length, original_shape=None)
¶
Unpacks uint64 array back to uint8 bitstream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
packed
|
ndarray[Any, Any]
|
Packed uint64 array (1D or 2D) |
required |
original_length
|
int
|
Total number of bits to extract |
required |
original_shape
|
Optional[tuple[Any, ...]]
|
Optional tuple for reshaping output (batch, length) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray[Any, Any]
|
Unpacked bitstream of shape (original_length,) or original_shape |
Source code in src/sc_neurocore/accel/vector_ops.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
vec_and(a_packed, b_packed)
¶
Bitwise AND on packed arrays. Simulates SC Multiplication.
Source code in src/sc_neurocore/accel/vector_ops.py
99 100 101 102 103 | |
vec_xnor(a_packed, b_packed)
¶
Bitwise XNOR on packed arrays. SC bipolar multiplication: P(A XNOR B) = P(A)P(B) + (1-P(A))(1-P(B)).
Source code in src/sc_neurocore/accel/vector_ops.py
106 107 108 109 110 | |
vec_not(packed)
¶
Bitwise NOT on packed arrays. SC complement: P(NOT A) = 1 - P(A).
Source code in src/sc_neurocore/accel/vector_ops.py
113 114 115 | |
vec_mux(select_packed, a_packed, b_packed)
¶
Bitwise MUX on packed arrays. SC scaled addition: P(out) = P(sel)P(A) + (1-P(sel))P(B).
When sel is a Bernoulli(0.5) stream, this computes the average (A+B)/2.
Source code in src/sc_neurocore/accel/vector_ops.py
118 119 120 121 122 123 124 125 126 127 | |
vec_popcount(packed)
¶
Count total set bits (1s) in the packed array. Used for integration/accumulation.
Source code in src/sc_neurocore/accel/vector_ops.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
GPU Backend¶
sc_neurocore.accel.gpu_backend
¶
to_device(arr)
¶
Move a NumPy array to the active backend (GPU copy or no-op).
Source code in src/sc_neurocore/accel/gpu_backend.py
66 67 68 69 70 | |
to_host(arr)
¶
Bring an array back to host RAM as a NumPy array.
Source code in src/sc_neurocore/accel/gpu_backend.py
73 74 75 76 77 | |
gpu_pack_bitstream(bits)
¶
Pack uint8 {0,1} array into uint64 words.
Works on both CuPy and NumPy arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bits
|
ndarray
|
Shape |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Packed uint64 array, shape |
Source code in src/sc_neurocore/accel/gpu_backend.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
gpu_vec_and(a, b)
¶
Bitwise AND on packed uint64 arrays (SC multiplication).
Source code in src/sc_neurocore/accel/gpu_backend.py
122 123 124 125 | |
gpu_popcount(packed)
¶
Vectorised SWAR popcount on uint64 arrays — returns per-element counts.
On CuPy this runs as a fused GPU kernel; on NumPy it uses the same
SWAR bit-trick as vector_ops.vec_popcount but returns an array
instead of a scalar.
Source code in src/sc_neurocore/accel/gpu_backend.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
gpu_vec_mac(packed_weights, packed_inputs)
¶
GPU-accelerated multiply-accumulate for a dense SC layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
packed_weights
|
ndarray
|
|
required |
packed_inputs
|
ndarray
|
|
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
|
Source code in src/sc_neurocore/accel/gpu_backend.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
JAX Backend¶
sc_neurocore.accel.jax_backend
¶
JAX backend for SC-NeuroCore.
Provides JAX-accelerated primitives for stochastic computing, unlocking automatic differentiation, JIT compilation (XLA), and native TPU/GPU scaling.
Usage::
from sc_neurocore.accel.jax_backend import jnp, HAS_JAX, to_jax, to_host
from sc_neurocore.accel.jax_backend import jax_pack_bitstream, jax_vec_mac
if HAS_JAX:
bits = jnp.array([1, 0, 1, 1], dtype=jnp.uint8)
packed = jax_pack_bitstream(bits)
to_jax(arr)
¶
Move a NumPy array to the JAX device.
Source code in src/sc_neurocore/accel/jax_backend.py
53 54 55 56 57 | |
to_host(arr)
¶
Bring a JAX array back to host RAM as a NumPy array.
Source code in src/sc_neurocore/accel/jax_backend.py
60 61 62 63 64 | |
jax_pack_bitstream(bits)
¶
Pack uint8 {0,1} array into uint64 words using JAX.
Source code in src/sc_neurocore/accel/jax_backend.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
jax_vec_and(a, b)
¶
Bitwise AND on packed uint64 arrays (SC multiplication).
Source code in src/sc_neurocore/accel/jax_backend.py
118 119 120 121 | |
jax_popcount(packed)
¶
Vectorised SWAR popcount on uint64 arrays using JAX.
Source code in src/sc_neurocore/accel/jax_backend.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
jax_vec_mac(packed_weights, packed_inputs)
¶
JAX-accelerated multiply-accumulate for a dense SC layer.
Source code in src/sc_neurocore/accel/jax_backend.py
140 141 142 143 144 145 146 147 148 | |
jax_lif_step(v, I_t, v_rest, v_reset, v_threshold, alpha, resistance, noise)
¶
Vectorized LIF step using JAX.
dv = (v_rest - v) * alpha + I_t * resistance + noise
Source code in src/sc_neurocore/accel/jax_backend.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
jax_forward_pass(weights, x, n_steps, v_rest=0.0, v_reset=0.0, v_threshold=1.0, alpha=0.9)
¶
Multi-layer SNN forward pass with LIF neurons.
Returns (spike_trains_per_layer, final_membrane_potentials). Each layer: s = Heaviside(v - threshold), v = alpha * v * (1-s) + W @ s_prev
Source code in src/sc_neurocore/accel/jax_backend.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
jax_surrogate_gradient_step(weights, x, targets, n_steps=25, lr=0.001, beta=10.0)
¶
One training step with surrogate gradient (fast sigmoid).
Uses jax.grad on a cross-entropy loss over mean output spike rates. Returns (updated_weights, loss_value).
Source code in src/sc_neurocore/accel/jax_backend.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | |
JIT Kernels¶
sc_neurocore.accel.jit_kernels
¶
jit_pack_bits(bitstream, packed_arr)
¶
Packs a uint8 bitstream into uint64 array. bitstream: (N,) uint8 {0, 1} packed_arr: (N//64,) uint64
Source code in src/sc_neurocore/accel/jit_kernels.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
jit_vec_mac(packed_weights, packed_inputs, outputs)
¶
Vectorized Multiply-Accumulate (MAC). Simulates: Output[i] = Sum(Weights[i] AND Inputs) weights: (n_neurons, n_inputs, n_words) inputs: (n_inputs, n_words) outputs: (n_neurons,)
Source code in src/sc_neurocore/accel/jit_kernels.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
MPI Driver¶
sc_neurocore.accel.mpi_driver
¶
MPIDriver
¶
Distributed SC-NeuroCore Driver using MPI. Handles partitioning and synchronization of bitstreams across cluster nodes.
Source code in src/sc_neurocore/accel/mpi_driver.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
scatter_workload(global_inputs)
¶
Distributes a large input array across nodes. Splits along axis 0 (Batch or Neurons).
Source code in src/sc_neurocore/accel/mpi_driver.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
gather_results(local_results)
¶
Collects results from all nodes to Root.
Source code in src/sc_neurocore/accel/mpi_driver.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
barrier()
¶
Synchronize all nodes.
Source code in src/sc_neurocore/accel/mpi_driver.py
69 70 71 72 | |