Acceleration¶
Backend modules for high-performance SC operations.
| Module | Purpose |
|---|---|
vector_ops |
Packed uint64 bitwise AND, popcount, pack/unpack |
gpu_backend |
CuPy GPU dispatch (transparent NumPy fallback) |
jax_backend |
JAX JIT-compiled LIF step for TPU/GPU scaling |
jit_kernels |
Numba-accelerated inner loops |
mpi_driver |
MPI-based distributed simulation |
Rust Safety Mirrors¶
src/sc_neurocore/accel/rust/ is a nested Rust crate for safety and contract
mirrors of higher-level Python modules. It is separate from the PyO3 engine:
the mirror crate is tested directly with Cargo, while the Python modules keep
their NumPy/Python path importable when optional engine submodules are absent.
Current documented mirrors:
| Mirror | Python surface | Verification |
|---|---|---|
safety/analysis.rs |
studio.analysis |
Rust unit tests plus tests/test_studio_analysis.py |
safety/dna_mapper.rs |
bridges.dna_mapper |
Rust unit tests plus 139 DNA mapper tests |
safety/l7_symbolic.rs |
scpn.layers.l7_symbolic |
Rust unit tests plus L7 and cross-layer contract tests |
safety/predictive_model.rs |
world_model.predictive_model |
Rust unit tests plus 77 passed predictive-model tests, with 3 optional-path skips |
Cargo command:
cargo test --manifest-path src/sc_neurocore/accel/rust/Cargo.toml --lib --no-default-features
Vector Operations¶
sc_neurocore.accel.vector_ops
¶
pack_bitstream(bitstream)
¶
Packs a uint8 bitstream (0s and 1s) into uint64 integers. This allows processing 64 time steps in parallel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bitstream
|
ndarray[Any, Any]
|
Shape (N,) or (Batch, N) of uint8 {0,1} |
required |
Returns:
| Name | Type | Description |
|---|---|---|
packed |
ndarray[Any, Any]
|
Shape (ceil(N/64),) or (Batch, ceil(N/64)) of uint64 |
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
unpack_bitstream(packed, original_length, original_shape=None)
¶
Unpacks uint64 array back to uint8 bitstream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
packed
|
ndarray[Any, Any]
|
Packed uint64 array (1D or 2D) |
required |
original_length
|
int
|
Total number of bits to extract |
required |
original_shape
|
Optional[tuple[Any, ...]]
|
Optional tuple for reshaping output (batch, length) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray[Any, Any]
|
Unpacked bitstream of shape (original_length,) or original_shape |
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
vec_and(a_packed, b_packed)
¶
Bitwise AND on packed arrays. Simulates SC Multiplication.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
105 106 107 108 109 110 | |
vec_xnor(a_packed, b_packed)
¶
Bitwise XNOR on packed arrays. SC bipolar multiplication: P(A XNOR B) = P(A)P(B) + (1-P(A))(1-P(B)).
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
113 114 115 116 117 118 | |
vec_not(packed)
¶
Bitwise NOT on packed arrays. SC complement: P(NOT A) = 1 - P(A).
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
121 122 123 124 | |
vec_mux(select_packed, a_packed, b_packed)
¶
Bitwise MUX on packed arrays. SC scaled addition: P(out) = P(sel)P(A) + (1-P(sel))P(B).
When sel is a Bernoulli(0.5) stream, this computes the average (A+B)/2.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
127 128 129 130 131 132 133 134 135 136 137 | |
vec_popcount(packed)
¶
Count total set bits (1s) in the packed array. Used for integration/accumulation.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
GPU Backend¶
sc_neurocore.accel.gpu_backend
¶
to_device(arr)
¶
Move a NumPy array to the active backend (GPU copy or no-op).
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
93 94 95 96 97 98 99 100 | |
to_host(arr)
¶
Bring an array back to host RAM as a NumPy array.
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
103 104 105 106 107 108 109 110 111 112 | |
gpu_pack_bitstream(bits)
¶
Pack uint8 {0,1} array into uint64 words.
Works on both CuPy and NumPy arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bits
|
ndarray
|
Shape |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Packed uint64 array, shape |
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
gpu_vec_and(a, b)
¶
Bitwise AND on packed uint64 arrays (SC multiplication).
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
203 204 205 206 207 208 209 210 211 | |
gpu_popcount(packed)
¶
Vectorised SWAR popcount on uint64 arrays — returns per-element counts.
On CuPy this runs as a fused GPU kernel; on NumPy it uses the same
SWAR bit-trick as vector_ops.vec_popcount but returns an array
instead of a scalar.
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | |
gpu_vec_mac(packed_weights, packed_inputs)
¶
GPU-accelerated multiply-accumulate for a dense SC layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
packed_weights
|
ndarray
|
|
required |
packed_inputs
|
ndarray
|
|
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
|
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
JAX Backend¶
sc_neurocore.accel.jax_backend
¶
JAX backend for SC-NeuroCore.
Provides JAX-accelerated primitives for stochastic computing, unlocking automatic differentiation, JIT compilation (XLA), and native TPU/GPU scaling.
Usage::
from sc_neurocore.accel.jax_backend import jnp, HAS_JAX, to_jax, to_host
from sc_neurocore.accel.jax_backend import jax_pack_bitstream, jax_vec_mac
if HAS_JAX:
bits = jnp.array([1, 0, 1, 1], dtype=jnp.uint8)
packed = jax_pack_bitstream(bits)
to_jax(arr)
¶
Move a NumPy array to the JAX device.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
63 64 65 66 67 | |
to_host(arr)
¶
Bring a JAX array back to host RAM as a NumPy array.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
70 71 72 73 74 | |
jax_pack_bitstream(bits)
¶
Pack uint8 {0,1} array into uint64 words using JAX.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 | |
jax_vec_and(a, b)
¶
Bitwise AND on matching non-empty uint64 packed arrays.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
389 390 391 392 393 394 395 396 | |
jax_popcount(packed)
¶
Vectorised SWAR popcount on a non-empty uint64 array using JAX.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
415 416 417 418 419 420 421 | |
jax_vec_mac(packed_weights, packed_inputs)
¶
JAX-accelerated multiply-accumulate for packed uint64 dense SC layers.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | |
jax_lif_step(v, I_t, v_rest, v_reset, v_threshold, alpha, resistance, noise)
¶
Vectorized LIF step using JAX with fail-closed public input guards.
dv = (v_rest - v) * alpha + I_t * resistance + noise
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 | |
jax_forward_pass(weights, x, n_steps, v_rest=0.0, v_reset=0.0, v_threshold=1.0, alpha=0.9)
¶
Multi-layer SNN forward pass with LIF neurons.
Returns (spike_trains_per_layer, final_membrane_potentials). Each layer: s = Heaviside(v - threshold), v = alpha * v * (1-s) + W @ s_prev
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 | |
jax_surrogate_loss(weights, x, targets, n_steps=25, beta=10.0, threshold=1.0, surrogate_path='custom_vjp')
¶
Cross-entropy loss for JAX SNN training with explicit surrogate paths.
Available paths:
- custom_vjp: hard spikes forward, fast-sigmoid proxy backward
via jax.custom_vjp
- legacy_stop_gradient: historical straight-through reset path
using jax.lax.stop_gradient
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 | |
jax_surrogate_gradient_step(weights, x, targets, n_steps=25, lr=0.001, beta=10.0, threshold=1.0, surrogate_path='custom_vjp')
¶
One training step with surrogate gradient over an explicit JAX path.
custom_vjp is the modern path. legacy_stop_gradient keeps the
historical training route available for side-by-side verification.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 | |
JIT Kernels¶
sc_neurocore.accel.jit_kernels
¶
jit_pack_bits(bitstream, packed_arr)
¶
Packs a uint8 bitstream into uint64 array. bitstream: (N,) uint8 {0, 1} packed_arr: (N//64,) uint64
Source code in src/sc_neurocore/accel/jit_kernels.py
| Python | |
|---|---|
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
jit_vec_mac(packed_weights, packed_inputs, outputs)
¶
Vectorized Multiply-Accumulate (MAC). Simulates: Output[i] = Sum(Weights[i] AND Inputs) weights: (n_neurons, n_inputs, n_words) inputs: (n_inputs, n_words) outputs: (n_neurons,)
Source code in src/sc_neurocore/accel/jit_kernels.py
| Python | |
|---|---|
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
MPI Driver¶
sc_neurocore.accel.mpi_driver
¶
MPIDriver
¶
Distributed SC-NeuroCore Driver using MPI. Handles partitioning and synchronization of bitstreams across cluster nodes.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
scatter_workload(global_inputs)
¶
Distributes a large input array across nodes. Splits along axis 0 (Batch or Neurons).
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
gather_results(local_results)
¶
Collects results from all nodes to Root.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
barrier()
¶
Synchronize all nodes.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
77 78 79 80 | |