Acceleration¶
Backend modules for high-performance SC operations.
The shipped acceleration contract is the Python package plus maintained optional Rust engine paths and explicitly documented optional backends. The Julia, Go, Mojo, and WGSL sources in the repository form a research-only polyglot benchmark layer: they are used for parity checks, timing studies, and future backend evaluation, but they are not installed by default and are not required for deployment.
Do not treat the presence of a language mirror under src/sc_neurocore/accel/
as a shipped runtime guarantee. A backend becomes part of the supported user
surface only when a maintained Python entrypoint, install profile, tests, and
documentation all name that path explicitly.
| Module | Purpose |
|---|---|
backend |
Runtime selector for the shipped stochastic-inference backend |
vector_ops |
Packed uint64 bitwise AND, popcount, pack/unpack |
gpu_backend |
CuPy GPU dispatch (transparent NumPy fallback) |
jax_backend |
JAX JIT-compiled LIF step for TPU/GPU scaling |
jit_kernels |
Numba-accelerated inner loops |
mpi_driver |
MPI-based distributed simulation |
Rust Safety Mirrors¶
src/sc_neurocore/accel/rust/ is a nested Rust crate for safety and contract
mirrors of higher-level Python modules. It is separate from the PyO3 engine:
the mirror crate is tested directly with Cargo, while the Python modules keep
their NumPy/Python path importable when optional engine submodules are absent.
Current documented mirrors:
| Mirror | Python surface | Verification |
|---|---|---|
safety/analysis.rs |
studio.analysis |
Rust unit tests plus tests/test_studio_analysis.py |
safety/dna_mapper.rs |
bridges.dna_mapper |
Rust unit tests plus 139 DNA mapper tests |
safety/l7_symbolic.rs |
scpn.layers.l7_symbolic |
Rust unit tests plus L7 and cross-layer contract tests |
safety/predictive_model.rs |
world_model.predictive_model |
Rust unit tests plus 77 passed predictive-model tests, with 3 optional-path skips |
Cargo command:
cargo test --manifest-path src/sc_neurocore/accel/rust/Cargo.toml --lib --no-default-features
Backend Selector¶
sc_neurocore.accel.backend
¶
Stable acceleration-backend selector for the public SC inference surface.
get_backend returns the fastest available backend handle (Rust accelerated
path first, NumPy fallback last). Each handle exposes sc_forward with the same
contract; the Rust path and the NumPy fallback are bit-identical for a fixed seed.
This replaces the removed sc_neurocore.accel.get_backend import relied on by
the SCPN-CONTROL Petri-net compiler fast path.
Backend
¶
Bases: ABC
Acceleration backend handle exposing the stable SC inference contract.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
sc_forward(weights_packed, input_probs, *, length, seed=44257)
abstractmethod
¶
Run the stochastic forward pass; see :func:sc_neurocore.accel.sc_forward.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
40 41 42 43 44 45 46 47 48 49 | |
NumpyBackend
¶
Bases: Backend
NumPy fallback backend (always available, the bit-true floor).
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
sc_forward(weights_packed, input_probs, *, length, seed=44257)
¶
Run the NumPy bit-true stochastic forward pass.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
57 58 59 60 61 62 63 64 65 66 | |
RustBackend
¶
Bases: Backend
Rust-accelerated backend over the compiled engine.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
sc_forward(weights_packed, input_probs, *, length, seed=44257)
¶
Run the compiled Rust stochastic forward pass.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
available_backends()
¶
Report which SC inference backends resolve, in fastest-first order.
Returns¶
dict
Mapping of backend name to availability; numpy is always True.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
114 115 116 117 118 119 120 121 122 | |
get_backend(name='auto')
¶
Return an SC inference backend handle.
Parameters¶
name : str, optional
"auto" (default) returns the fastest available backend in
:data:PRIORITY order; a specific name ("rust", "numpy") forces
that backend.
Returns¶
Backend
A handle whose sc_forward matches the documented contract.
Raises¶
ValueError
If name is not "auto" or a known backend name.
RuntimeError
If an explicitly requested backend is unavailable.
Source code in src/sc_neurocore/accel/backend.py
| Python | |
|---|---|
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
Vector Operations¶
sc_neurocore.accel.vector_ops
¶
Packed-bitstream vector operations for stochastic-computing kernels.
This module packs binary streams into uint64 words and provides deterministic
NumPy implementations of Boolean stochastic-computing primitives, unpacking,
and popcount accumulation for tests, CPU execution, and parity checks.
pack_bitstream(bitstream)
¶
Pack a uint8 bitstream into uint64 words for 64-way parallel processing.
Parameters¶
bitstream : numpy.ndarray of shape (N,) or (Batch, N), uint8
Input bits valued in {0, 1}.
Returns¶
numpy.ndarray of shape (ceil(N / 64),) or (Batch, ceil(N / 64)), uint64 The packed 64-bit words.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
unpack_bitstream(packed, original_length, original_shape=None)
¶
Unpacks uint64 array back to uint8 bitstream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
packed
|
ndarray[Any, Any]
|
Packed uint64 array (1D or 2D) |
required |
original_length
|
int
|
Total number of bits to extract |
required |
original_shape
|
Optional[tuple[Any, ...]]
|
Optional tuple for reshaping output (batch, length) |
None
|
Returns¶
Unpacked bitstream of shape (original_length,) or original_shape
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
vec_and(a_packed, b_packed)
¶
Bitwise-AND two packed arrays, realising stochastic multiplication.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
116 117 118 119 | |
vec_xnor(a_packed, b_packed)
¶
Bitwise XNOR on packed arrays. SC bipolar multiplication: P(A XNOR B) = P(A)P(B) + (1-P(A))(1-P(B)).
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
122 123 124 125 126 127 | |
vec_not(packed)
¶
Bitwise NOT on packed arrays. SC complement: P(NOT A) = 1 - P(A).
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
130 131 132 133 | |
vec_mux(select_packed, a_packed, b_packed)
¶
Bitwise MUX on packed arrays. SC scaled addition: P(out) = P(sel)P(A) + (1-P(sel))P(B).
When sel is a Bernoulli(0.5) stream, this computes the average (A+B)/2.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
136 137 138 139 140 141 142 143 144 145 146 | |
vec_popcount(packed)
¶
Count the total set bits in a packed array, for integration/accumulation.
Source code in src/sc_neurocore/accel/vector_ops.py
| Python | |
|---|---|
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |
GPU Backend¶
sc_neurocore.accel.gpu_backend
¶
CuPy/NumPy dual-path backend for stochastic-computing array kernels.
The module exposes a runtime-switching array namespace plus helpers for moving arrays between host and device, packing bitstreams, popcounting packed words, and running stochastic vector operations with a deterministic NumPy fallback.
to_device(arr)
¶
Move a NumPy array to the active backend (GPU copy or no-op).
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
83 84 85 86 87 88 89 90 | |
to_host(arr)
¶
Bring an array back to host RAM as a NumPy array.
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
93 94 95 96 97 98 99 100 101 102 | |
gpu_pack_bitstream(bits)
¶
Pack uint8 {0,1} array into uint64 words.
Works on both CuPy and NumPy arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bits
|
ndarray
|
Shape |
required |
Returns¶
Packed uint64 array, shape ``(ceil(N/64),)`` or ``(B, ceil(N/64))``.
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
gpu_vec_and(a, b)
¶
Bitwise AND on packed uint64 arrays (SC multiplication).
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
200 201 202 203 204 205 206 207 208 | |
gpu_popcount(packed)
¶
Vectorised SWAR popcount on uint64 arrays — returns per-element counts.
On CuPy this runs as a fused GPU kernel; on NumPy it uses the same
SWAR bit-trick as vector_ops.vec_popcount but returns an array
instead of a scalar.
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |
gpu_vec_mac(packed_weights, packed_inputs)
¶
GPU-accelerated multiply-accumulate for a dense SC layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
packed_weights
|
ndarray
|
|
required |
packed_inputs
|
ndarray
|
|
required |
Returns¶
``(n_neurons,)`` total bit counts (= SC dot products).
Source code in src/sc_neurocore/accel/gpu_backend.py
| Python | |
|---|---|
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 | |
JAX Backend¶
sc_neurocore.accel.jax_backend
¶
JAX backend for SC-NeuroCore.
Provides JAX-accelerated primitives for stochastic computing, unlocking automatic differentiation, JIT compilation (XLA), and native TPU/GPU scaling.
Usage::
from sc_neurocore.accel.jax_backend import jnp, HAS_JAX, to_jax, to_host
from sc_neurocore.accel.jax_backend import jax_pack_bitstream, jax_vec_mac
if HAS_JAX:
bits = jnp.array([1, 0, 1, 1], dtype=jnp.uint8)
packed = jax_pack_bitstream(bits)
to_jax(arr)
¶
Move a NumPy array to the JAX device.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
65 66 67 68 69 | |
to_host(arr)
¶
Bring a JAX array back to host RAM as a NumPy array.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
72 73 74 75 76 | |
jax_pack_bitstream(bits)
¶
Pack uint8 {0,1} array into uint64 words using JAX.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 | |
jax_vec_and(a, b)
¶
Bitwise AND on matching non-empty uint64 packed arrays.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
391 392 393 394 395 396 397 398 | |
jax_popcount(packed)
¶
Vectorised SWAR popcount on a non-empty uint64 array using JAX.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
417 418 419 420 421 422 423 | |
jax_vec_mac(packed_weights, packed_inputs)
¶
JAX-accelerated multiply-accumulate for packed uint64 dense SC layers.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 | |
jax_lif_step(v, I_t, v_rest, v_reset, v_threshold, alpha, resistance, noise)
¶
Vectorized LIF step using JAX with fail-closed public input guards.
dv = (v_rest - v) * alpha + I_t * resistance + noise
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 | |
jax_forward_pass(weights, x, n_steps, v_rest=0.0, v_reset=0.0, v_threshold=1.0, alpha=0.9)
¶
Multi-layer SNN forward pass with LIF neurons.
Returns (spike_trains_per_layer, final_membrane_potentials). Each layer: s = Heaviside(v - threshold), v = alpha * v * (1-s) + W @ s_prev
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 | |
jax_surrogate_loss(weights, x, targets, n_steps=25, beta=10.0, threshold=1.0, surrogate_path='custom_vjp')
¶
Cross-entropy loss for JAX SNN training with explicit surrogate paths.
Available paths:
- custom_vjp: hard spikes forward, fast-sigmoid proxy backward
via jax.custom_vjp
- legacy_stop_gradient: historical straight-through reset path
using jax.lax.stop_gradient
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 | |
jax_surrogate_gradient_step(weights, x, targets, n_steps=25, lr=0.001, beta=10.0, threshold=1.0, surrogate_path='custom_vjp')
¶
One training step with surrogate gradient over an explicit JAX path.
custom_vjp is the modern path. legacy_stop_gradient keeps the
historical training route available for side-by-side verification.
Source code in src/sc_neurocore/accel/jax_backend.py
| Python | |
|---|---|
607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 | |
JIT Kernels¶
sc_neurocore.accel.jit_kernels
¶
Numba-accelerated kernels for packed stochastic-computing hot loops.
The module exposes optional JIT implementations for bitstream packing and packed multiply-accumulate operations while preserving a pure-Python fallback when Numba is not installed.
jit_pack_bits(bitstream, packed_arr)
¶
Pack a uint8 bitstream into a uint64 word array.
Parameters¶
bitstream : numpy.ndarray of shape (N,), uint8
Input bits valued in {0, 1}.
packed_arr : numpy.ndarray of shape (N // 64,), uint64
Output array receiving the packed 64-bit words.
Source code in src/sc_neurocore/accel/jit_kernels.py
| Python | |
|---|---|
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
jit_vec_mac(packed_weights, packed_inputs, outputs)
¶
Accumulate a packed bitwise multiply-accumulate (MAC).
Computes outputs[i] = sum(popcount(packed_weights[i] AND packed_inputs)).
Parameters¶
packed_weights : numpy.ndarray of shape (n_neurons, n_inputs, n_words), uint64 Packed synaptic weight bitstreams. packed_inputs : numpy.ndarray of shape (n_inputs, n_words), uint64 Packed input bitstreams. outputs : numpy.ndarray of shape (n_neurons,) Output array receiving the accumulated MAC results.
Source code in src/sc_neurocore/accel/jit_kernels.py
| Python | |
|---|---|
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
MPI Driver¶
sc_neurocore.accel.mpi_driver.MPIDriver is an optional mpi4py integration
for distributing stochastic-computing arrays across MPI ranks. When mpi4py is
unavailable, or when the communicator is absent, the driver stays in
single-rank mode (rank == 0, size == 1) and returns the caller's local
arrays unchanged. In multi-rank gathers, only root receives the concatenated
buffer; non-root ranks return an empty array that preserves the local result
dtype.
sc_neurocore.accel.mpi_driver
¶
MPI scatter/gather driver for distributed stochastic-computing workloads.
The driver keeps single-process execution deterministic when mpi4py is not
available, while exposing the same workload partitioning, result collection, and
barrier interface used by multi-node stochastic-computing deployments.
MPIDriver
¶
Distributed sc-neurocore driver built on MPI.
Handles partitioning and synchronisation of bitstreams across cluster nodes.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
scatter_workload(global_inputs)
¶
Distribute a large input array across nodes along axis 0.
Parameters¶
global_inputs : numpy.ndarray Full input array held on the root rank, split along axis 0 (batch or neuron dimension).
Returns¶
numpy.ndarray This rank's contiguous chunk of the input array.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
gather_results(local_results)
¶
Collect per-node result arrays back to the root rank.
Non-root MPI ranks return an empty array with the same dtype as their local result buffer, which keeps downstream dtype-sensitive callers deterministic even when they execute on ranks that do not receive the gathered payload.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
barrier()
¶
Synchronize all nodes.
Source code in src/sc_neurocore/accel/mpi_driver.py
| Python | |
|---|---|
102 103 104 105 | |