Acceleration¶

Backend modules for high-performance SC operations.

Module	Purpose
`vector_ops`	Packed uint64 bitwise AND, popcount, pack/unpack
`gpu_backend`	CuPy GPU dispatch (transparent NumPy fallback)
`jax_backend`	JAX JIT-compiled LIF step for TPU/GPU scaling
`jit_kernels`	Numba-accelerated inner loops
`mpi_driver`	MPI-based distributed simulation

Rust Safety Mirrors¶

src/sc_neurocore/accel/rust/ is a nested Rust crate for safety and contract mirrors of higher-level Python modules. It is separate from the PyO3 engine: the mirror crate is tested directly with Cargo, while the Python modules keep their NumPy/Python path importable when optional engine submodules are absent.

Current documented mirrors:

Mirror	Python surface	Verification
`safety/analysis.rs`	`studio.analysis`	Rust unit tests plus `tests/test_studio_analysis.py`
`safety/dna_mapper.rs`	`bridges.dna_mapper`	Rust unit tests plus 139 DNA mapper tests
`safety/l7_symbolic.rs`	`scpn.layers.l7_symbolic`	Rust unit tests plus L7 and cross-layer contract tests
`safety/predictive_model.rs`	`world_model.predictive_model`	Rust unit tests plus 77 passed predictive-model tests, with 3 optional-path skips

Cargo command:

Bash

cargo test --manifest-path src/sc_neurocore/accel/rust/Cargo.toml --lib --no-default-features

Vector Operations¶

`sc_neurocore.accel.vector_ops` ¶

`pack_bitstream(bitstream)` ¶

Packs a uint8 bitstream (0s and 1s) into uint64 integers. This allows processing 64 time steps in parallel.

Parameters:

Name	Type	Description	Default
`bitstream`	`ndarray[Any, Any]`	Shape (N,) or (Batch, N) of uint8 {0,1}	required

Returns:

Name	Type	Description
`packed`	`ndarray[Any, Any]`	Shape (ceil(N/64),) or (Batch, ceil(N/64)) of uint64

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def pack_bitstream(bitstream: np.ndarray[Any, Any]) -> np.ndarray[Any, Any]:
    """
    Packs a uint8 bitstream (0s and 1s) into uint64 integers.
    This allows processing 64 time steps in parallel.

    Args:
        bitstream: Shape (N,) or (Batch, N) of uint8 {0,1}

    Returns:
        packed: Shape (ceil(N/64),) or (Batch, ceil(N/64)) of uint64
    """
    bitstream = np.asarray(bitstream, dtype=np.uint8)

    if bitstream.ndim == 1:
        # 1D case: single bitstream
        length = bitstream.size
        pad_len = (64 - (length % 64)) % 64
        if pad_len > 0:
            bitstream = np.append(bitstream, np.zeros(pad_len, dtype=np.uint8))

        chunks = bitstream.reshape(-1, 64)
        powers = 1 << np.arange(64, dtype=np.uint64)
        packed: np.ndarray[Any, Any] = (chunks * powers).sum(axis=1, dtype=np.uint64)
        return packed

    elif bitstream.ndim == 2:
        # 2D case: batch of bitstreams
        batch_size, length = bitstream.shape
        pad_len = (64 - (length % 64)) % 64

        if pad_len > 0:
            padding = np.zeros((batch_size, pad_len), dtype=np.uint8)
            bitstream = np.concatenate([bitstream, padding], axis=1)

        # Reshape to (batch, num_chunks, 64)
        num_chunks = bitstream.shape[1] // 64
        chunks: np.ndarray[Any, Any] = bitstream.reshape(batch_size, num_chunks, 64)  # type: ignore[no-redef]

        powers = 1 << np.arange(64, dtype=np.uint64)
        packed_2d: np.ndarray[Any, Any] = (chunks * powers).sum(axis=2, dtype=np.uint64)
        return packed_2d

    else:
        raise ValueError(f"Expected 1D or 2D array, got {bitstream.ndim}D")

`unpack_bitstream(packed, original_length, original_shape=None)` ¶

Unpacks uint64 array back to uint8 bitstream.

Parameters:

Name	Type	Description	Default
`packed`	`ndarray[Any, Any]`	Packed uint64 array (1D or 2D)	required
`original_length`	`int`	Total number of bits to extract	required
`original_shape`	`Optional[tuple[Any, ...]]`	Optional tuple for reshaping output (batch, length)	`None`

Returns:

Type	Description
`ndarray[Any, Any]`	Unpacked bitstream of shape (original_length,) or original_shape

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def unpack_bitstream(
    packed: np.ndarray[Any, Any],
    original_length: int,
    original_shape: Optional[tuple[Any, ...]] = None,
) -> np.ndarray[Any, Any]:
    """
    Unpacks uint64 array back to uint8 bitstream.

    Args:
        packed: Packed uint64 array (1D or 2D)
        original_length: Total number of bits to extract
        original_shape: Optional tuple for reshaping output (batch, length)

    Returns:
        Unpacked bitstream of shape (original_length,) or original_shape
    """
    if packed.ndim == 1:
        # 1D packed array
        bits = ((packed[:, None] & (1 << np.arange(64, dtype=np.uint64))) > 0).astype(np.uint8)
        unpacked = bits.flatten()
        result: np.ndarray[Any, Any] = unpacked[:original_length]
        return result

    elif packed.ndim == 2:
        # 2D packed array: (batch, num_chunks)
        batch_size, num_chunks = packed.shape
        # Extract bits: (batch, num_chunks, 64)
        bits = ((packed[:, :, None] & (1 << np.arange(64, dtype=np.uint64))) > 0).astype(np.uint8)
        # Reshape to (batch, num_chunks * 64)
        unpacked = bits.reshape(batch_size, -1)

        if original_shape is not None:
            result_2d: np.ndarray[Any, Any] = unpacked[:, : original_shape[1]]
            return result_2d
        else:
            # Assume original_length is per-batch
            per_batch_len = original_length // batch_size
            result_batch: np.ndarray[Any, Any] = unpacked[:, :per_batch_len]
            return result_batch

    else:
        raise ValueError(f"Expected 1D or 2D packed array, got {packed.ndim}D")

`vec_and(a_packed, b_packed)` ¶

Bitwise AND on packed arrays. Simulates SC Multiplication.

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def vec_and(a_packed: np.ndarray[Any, Any], b_packed: np.ndarray[Any, Any]) -> np.ndarray[Any, Any]:
    """
    Bitwise AND on packed arrays. Simulates SC Multiplication.
    """
    result: np.ndarray[Any, Any] = np.bitwise_and(a_packed, b_packed)
    return result

`vec_xnor(a_packed, b_packed)` ¶

Bitwise XNOR on packed arrays. SC bipolar multiplication: P(A XNOR B) = P(A)P(B) + (1-P(A))(1-P(B)).

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def vec_xnor(
    a_packed: np.ndarray[Any, Any], b_packed: np.ndarray[Any, Any]
) -> np.ndarray[Any, Any]:
    """Bitwise XNOR on packed arrays. SC bipolar multiplication: P(A XNOR B) = P(A)*P(B) + (1-P(A))*(1-P(B))."""
    result: np.ndarray[Any, Any] = ~np.bitwise_xor(a_packed, b_packed)
    return result

`vec_not(packed)` ¶

Bitwise NOT on packed arrays. SC complement: P(NOT A) = 1 - P(A).

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def vec_not(packed: np.ndarray[Any, Any]) -> np.ndarray[Any, Any]:
    """Bitwise NOT on packed arrays. SC complement: P(NOT A) = 1 - P(A)."""
    result: np.ndarray[Any, Any] = ~packed
    return result

`vec_mux(select_packed, a_packed, b_packed)` ¶

Bitwise MUX on packed arrays. SC scaled addition: P(out) = P(sel)P(A) + (1-P(sel))P(B).

When sel is a Bernoulli(0.5) stream, this computes the average (A+B)/2.

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def vec_mux(
    select_packed: np.ndarray[Any, Any],
    a_packed: np.ndarray[Any, Any],
    b_packed: np.ndarray[Any, Any],
) -> np.ndarray[Any, Any]:
    """Bitwise MUX on packed arrays. SC scaled addition: P(out) = P(sel)*P(A) + (1-P(sel))*P(B).

    When sel is a Bernoulli(0.5) stream, this computes the average (A+B)/2.
    """
    result: np.ndarray[Any, Any] = (select_packed & a_packed) | (~select_packed & b_packed)
    return result

`vec_popcount(packed)` ¶

Count total set bits (1s) in the packed array. Used for integration/accumulation.

Source code in src/sc_neurocore/accel/vector_ops.py

Python
def vec_popcount(packed: np.ndarray[Any, Any]) -> int:
    """
    Count total set bits (1s) in the packed array.
    Used for integration/accumulation.
    """
    # Using numpy's ability to cast to specialized types or simple lookup?
    # Actually, Python 3.10+ int.bit_count() is fast, but for numpy arrays:
    # We can use a trick or just loop if C-extension isn't available.
    # A generic parallel popcount on uint64 in pure numpy is tricky without looping or lookup tables.
    # However, we can map to python int and sum.

    # For speed in pure python/numpy env without heavy deps:
    # Use binary decomposition for vectorized popcount
    x = packed.copy()
    x -= (x >> 1) & 0x5555555555555555
    x = (x & 0x3333333333333333) + ((x >> 2) & 0x3333333333333333)
    x = (x + (x >> 4)) & 0x0F0F0F0F0F0F0F0F
    x = (x * 0x0101010101010101) >> 56
    return int(np.sum(x))

GPU Backend¶

`sc_neurocore.accel.gpu_backend` ¶

`to_device(arr)` ¶

Move a NumPy array to the active backend (GPU copy or no-op).

Source code in src/sc_neurocore/accel/gpu_backend.py

Python
def to_device(arr: np.ndarray[Any, Any]) -> xp.ndarray:  # type: ignore
    """Move a NumPy array to the active backend (GPU copy or no-op)."""
    if _gpu_enabled():  # pragma: no cover
        try:
            return cp.asarray(arr)
        except RuntimeError as exc:  # pragma: no cover
            _mark_gpu_runtime_broken(exc)
    return arr

`to_host(arr)` ¶

Bring an array back to host RAM as a NumPy array.

Source code in src/sc_neurocore/accel/gpu_backend.py

Python
def to_host(arr: Any) -> np.ndarray[Any, Any]:
    """Bring an array back to host RAM as a NumPy array."""
    if HAS_CUPY and isinstance(arr, cp.ndarray):  # pragma: no cover
        try:
            result: np.ndarray[Any, Any] = arr.get()
            return result
        except RuntimeError as exc:  # pragma: no cover
            _mark_gpu_runtime_broken(exc)
    out: np.ndarray[Any, Any] = np.asarray(arr)
    return out

`gpu_pack_bitstream(bits)` ¶

Pack uint8 {0,1} array into uint64 words.

Works on both CuPy and NumPy arrays.

Parameters:

Name	Type	Description	Default
`bits`	`ndarray`	Shape `(N,)` or `(B, N)` of uint8.	required

Returns:

Type	Description
`ndarray`	Packed uint64 array, shape `(ceil(N/64),)` or `(B, ceil(N/64))`.

Source code in src/sc_neurocore/accel/gpu_backend.py

Python
def gpu_pack_bitstream(bits: xp.ndarray) -> xp.ndarray:  # type: ignore
    """
    Pack uint8 {0,1} array into uint64 words.

    Works on both CuPy and NumPy arrays.

    Args:
        bits: Shape ``(N,)`` or ``(B, N)`` of uint8.

    Returns:
        Packed uint64 array, shape ``(ceil(N/64),)`` or ``(B, ceil(N/64))``.
    """
    if _gpu_enabled():  # pragma: no cover
        try:
            bits = cp.asarray(bits, dtype=cp.uint8)
            if bits.ndim == 1:
                length = bits.size
                pad = (64 - length % 64) % 64
                if pad:
                    bits = cp.concatenate([bits, cp.zeros(pad, dtype=cp.uint8)])
                chunks = bits.reshape(-1, 64)
                powers = cp.uint64(1) << cp.arange(64, dtype=cp.uint64)
                return (chunks.astype(cp.uint64) * powers).sum(axis=1)

            if bits.ndim == 2:
                batch, length = bits.shape
                pad = (64 - length % 64) % 64
                if pad:
                    bits = cp.concatenate(
                        [bits, cp.zeros((batch, pad), dtype=cp.uint8)],
                        axis=1,
                    )
                n_words = bits.shape[1] // 64
                chunks = bits.reshape(batch, n_words, 64)
                powers = cp.uint64(1) << cp.arange(64, dtype=cp.uint64)
                return (chunks.astype(cp.uint64) * powers).sum(axis=2)
        except RuntimeError as exc:  # pragma: no cover
            _mark_gpu_runtime_broken(exc)

    _warn_cpu_fallback()
    return _numpy_pack_bitstream(to_host(bits))

`gpu_vec_and(a, b)` ¶

Bitwise AND on packed uint64 arrays (SC multiplication).

Source code in src/sc_neurocore/accel/gpu_backend.py

Python
def gpu_vec_and(a: xp.ndarray, b: xp.ndarray) -> xp.ndarray:  # type: ignore
    """Bitwise AND on packed uint64 arrays (SC multiplication)."""
    if _gpu_enabled():  # pragma: no cover
        try:
            return cp.bitwise_and(a, b)
        except RuntimeError as exc:  # pragma: no cover
            _mark_gpu_runtime_broken(exc)
    _warn_cpu_fallback()
    return np.bitwise_and(to_host(a), to_host(b))

`gpu_popcount(packed)` ¶

Vectorised SWAR popcount on uint64 arrays — returns per-element counts.

On CuPy this runs as a fused GPU kernel; on NumPy it uses the same SWAR bit-trick as vector_ops.vec_popcount but returns an array instead of a scalar.

Source code in src/sc_neurocore/accel/gpu_backend.py

Python
def gpu_popcount(packed: xp.ndarray) -> xp.ndarray:  # type: ignore
    """
    Vectorised SWAR popcount on uint64 arrays — returns per-element counts.

    On CuPy this runs as a fused GPU kernel; on NumPy it uses the same
    SWAR bit-trick as ``vector_ops.vec_popcount`` but returns an array
    instead of a scalar.
    """
    if _gpu_enabled():  # pragma: no cover
        try:
            x = cp.asarray(packed, dtype=cp.uint64).copy()
            m1 = cp.uint64(0x5555555555555555)
            m2 = cp.uint64(0x3333333333333333)
            m4 = cp.uint64(0x0F0F0F0F0F0F0F0F)
            h01 = cp.uint64(0x0101010101010101)

            x -= (x >> cp.uint64(1)) & m1
            x = (x & m2) + ((x >> cp.uint64(2)) & m2)
            x = (x + (x >> cp.uint64(4))) & m4
            return (x * h01) >> cp.uint64(56)
        except RuntimeError as exc:  # pragma: no cover
            _mark_gpu_runtime_broken(exc)

    _warn_cpu_fallback()
    return _numpy_popcount(to_host(packed))

`gpu_vec_mac(packed_weights, packed_inputs)` ¶

GPU-accelerated multiply-accumulate for a dense SC layer.

Parameters:

Name	Type	Description	Default
`packed_weights`	`ndarray`	`(n_neurons, n_inputs, n_words)` uint64	required
`packed_inputs`	`ndarray`	`(n_inputs, n_words)` uint64	required

Returns:

Type	Description
`ndarray`	`(n_neurons,)` total bit counts (= SC dot products).

Source code in src/sc_neurocore/accel/gpu_backend.py

Python
def gpu_vec_mac(
    packed_weights: xp.ndarray,  # type: ignore
    packed_inputs: xp.ndarray,  # type: ignore
) -> xp.ndarray:  # type: ignore
    """
    GPU-accelerated multiply-accumulate for a dense SC layer.

    Args:
        packed_weights: ``(n_neurons, n_inputs, n_words)`` uint64
        packed_inputs:  ``(n_inputs, n_words)`` uint64

    Returns:
        ``(n_neurons,)`` total bit counts (= SC dot products).
    """
    if _gpu_enabled():  # pragma: no cover
        try:
            products = cp.bitwise_and(packed_weights, packed_inputs[None, :, :])
            counts = gpu_popcount(products)
            return counts.sum(axis=(1, 2))
        except RuntimeError as exc:  # pragma: no cover
            _mark_gpu_runtime_broken(exc)

    _warn_cpu_fallback()
    weights_np = to_host(packed_weights)
    inputs_np = to_host(packed_inputs)
    products = np.bitwise_and(weights_np, inputs_np[None, :, :])
    counts = _numpy_popcount(products)
    return counts.sum(axis=(1, 2))

JAX Backend¶

`sc_neurocore.accel.jax_backend` ¶

JAX backend for SC-NeuroCore.

Provides JAX-accelerated primitives for stochastic computing, unlocking automatic differentiation, JIT compilation (XLA), and native TPU/GPU scaling.

Usage::

Text Only

from sc_neurocore.accel.jax_backend import jnp, HAS_JAX, to_jax, to_host
from sc_neurocore.accel.jax_backend import jax_pack_bitstream, jax_vec_mac

if HAS_JAX:
    bits = jnp.array([1, 0, 1, 1], dtype=jnp.uint8)
    packed = jax_pack_bitstream(bits)