Skip to content

Edge — bare-metal SC runtime + AER mesh

Pure-Python port of the tinysc_riscv bare-metal crate: a complete SC inference runtime that runs on a RISC-V MCU without FPU, plus the AER UDP mesh router used for multi-FPGA deployments. The Python and Rust sides are bit-compatible — a weight blob produced in Python deserialises byte-for-byte on the MCU, and an LFSR bitstream produced on either side matches the other's output word-for-word.

Python
from sc_neurocore.edge import (
    # Packed bitstream primitives
    popcount32, popcount_slice, sc_and, sc_or, sc_xor, sc_sub, sc_mux,
    and_packed, mux_packed, probability, scc,
    # Pseudo-random / low-discrepancy encoders
    Lfsr16, SobolGenerator,
    # SC neurons
    LifNeuron, IzhikevichNeuron,
    # Network runner
    SCLayer, SCNetwork,
    # Telemetry + serialisation + board profiles
    TelemetryRing, LayerTelemetry, DeviceTelemetry,
    WeightHeader, LayerHeader, WEIGHT_MAGIC,
    serialize_weights, deserialize_weights,
    PowerProfile, Board,
    PowerThermalConfig, build_power_thermal_model_from_vivado_reports,
    WebDeploymentConfig, build_web_deployment,
)
from sc_neurocore.edge.aer_router import AERRoutingDaemon

Browser Deployment Scaffold

sc_neurocore.edge.web_deploy emits a deterministic static web bundle for .nir, .pt, .pth, and JSON model artefacts. It is the first slice of the WASM/WebGPU deployment path: generation does not require a browser, WebGPU driver, Node.js, or a native WASM toolchain, and the emitted manifest.json records the runtime contract honestly.

Python
from sc_neurocore.edge import WebDeploymentConfig, build_web_deployment

manifest = build_web_deployment(
    "model.nir",
    "build/web",
    WebDeploymentConfig(dt=1.0, bitstream_length=256),
)
print(manifest.artefacts["html"])  # index.html

Generated layout:

Text Only
build/web/
  index.html
  manifest.json
  model/model.nir
  runtime/sc_neurocore_web.js
  runtime/sc_neurocore_webgpu.wgsl

The browser runtime loads the manifest, checks WebGPU availability, and exposes the SC probability contract used by later WASM kernels. The WGSL kernel clamps probabilities into [0, 1]; it is intentionally small so it can be used as a capability and packaging test before computational kernels are added.


1. Mathematical formalism

1.1 Packed bitstream word

A bitstream of length $L$ is stored as $\lceil L/32 \rceil$ unsigned 32-bit words. The popcount of one word is Wilkes–Wheeler–Gill:

$$ \mathrm{popcount32}(x) = \sum_{b=0}^{31} \bigl((x \gg b) \wedge 1\bigr), $$

computed in six SWAR steps (shift-and-mask) — the same algorithm as core_engine::bitstream::popcount32 in the Rust crate.

1.2 SC arithmetic via bitwise ops

Let $A,B \in {0,1}^{L}$ be independent unipolar streams with bit probabilities $p_{A},\,p_{B}$. The SC arithmetic operators are:

$$ \begin{aligned} \text{multiply} \quad & A \wedge B, & p_{A\wedge B} &= p_{A}\,p_{B} \ \text{saturating add} \quad & A \vee B, & p_{A\vee B} &= p_{A} + p_{B} - p_{A}\,p_{B} \ \text{abs.\ difference} \quad & A \oplus B, & p_{A\oplus B} &= p_{A}(1-p_{B}) + p_{B}(1-p_{A}) \ \text{sat.\ subtract} \quad & A \wedge \neg B, & p_{A \setminus B} &= p_{A}(1 - p_{B}) \ \text{scaled add (MUX)} \quad & (S \wedge A) \vee (\neg S \wedge B), & p_{\text{mux}} &= p_{S}\,p_{A} + (1-p_{S})\,p_{B} \end{aligned} $$

These identities are the foundation of stochastic computing (Gaines, 1967) and make all SC layers computable with only AND / OR / NOT / MUX cells.

1.3 Alaghi–Hayes SCC

$$ \mathrm{SCC}(A,B) = \begin{cases} \dfrac{p_{A \wedge B} - p_{A}p_{B}}{\min(p_{A},\,p_{B}) - p_{A}p_{B}}, & p_{A \wedge B} \geq p_{A}p_{B} \[4pt] \dfrac{p_{A \wedge B} - p_{A}p_{B}}{p_{A}p_{B} - \max(0,\,p_{A}+p_{B}-1)}, & \text{otherwise} \end{cases} $$

:func:scc implements the case-split exactly; output is bounded in $[-1,\,+1]$.

1.4 Galois LFSR-16 encoder

:class:Lfsr16 uses the polynomial $x^{16} + x^{14} + x^{13} + x^{11} + 1$ (Galois form, taps = 0xD008) with state $r_{t} \in {1,\,\ldots,\,65535}$:

$$ b_{t} = r_{t} \wedge {2^{0}!+!2^{2}!+!2^{3}!+!2^{5}}, \qquad r_{t+1} = (r_{t} \gg 1) \,\vee\, (b_{t} \ll 15). $$

Period is 65 535 (maximal for 16 bits). The probability encoder compares the 16-bit state to a threshold $\theta \in [0,\,65535]$:

$$ \mathrm{bit}{t} = \mathbf{1}\bigl[r < \theta\bigr], \qquad p = \theta/65535. $$

1.5 Sobol low-discrepancy sequence

:class:SobolGenerator implements 1-D Sobol with Joe–Kuo direction numbers (dimension 1: $V_{k} = 2^{15-k}$ for $k = 1..16$). Using Gray-code indexing, each step costs one XOR:

$$ x_{n+1} = x_{n} \oplus V_{c(n)}, \qquad c(n) = \text{position of lowest 1-bit in } n. $$

Bits generated with Sobol thresholding have discrepancy $O(\log^{d} N / N)$ vs LFSR's $O(1/\sqrt{N})$, so SC pipelines fed from Sobol streams hit target precision with $L$ up to 4× shorter.

1.6 LIF with popcount accumulator

:class:LifNeuron tracks membrane $V$ as a running popcount with right-shift leak:

$$ V_{t+1} = V_{t} + \mathrm{popcount}(I_{t}) - (V_{t} \gg s), \qquad V_{t+1} \geq \Theta \;\Rightarrow\; \text{spike},\;\; V_{t+1} \leftarrow 0, $$

where $s$ is the leak-shift (typical $s=3$ gives $\tau_{\text{leak}} = 1/(1 - 2^{-3}) \approx 8$ ticks). No multiplier, no FPU needed.

1.7 Izhikevich in Q16.16

:class:IzhikevichNeuron runs Izhikevich's 2003 two-variable model entirely in 32-bit integer Q16.16:

$$ \dot{V} = 0.04 V^{2} + 5 V + 140 - U + I, \qquad \dot{U} = a (b V - U), $$

with reset $(V \geq 30 \Rightarrow V \leftarrow c,\; U \leftarrow U + d)$. Q16.16 gives $\Delta = 2^{-16} \approx 1.5 \cdot 10^{-5}$ resolution on $V$; the four presets (regular spiking, fast spiking, chattering, intrinsic burst) match the published $(a,b,c,d)$ tuples exactly.

1.8 Weight-blob wire format

Text Only
offset | size | field
-------+------+----------------------
  0    |  4   | magic  = 0x5343_574C ("SCWL")
  4    |  4   | version
  8    |  4   | n_layers
 12    |  4   | flags
 16    |  4   | layer[0].n_inputs
 20    |  4   | layer[0].n_outputs
 24    |  4   | layer[0].threshold
 28    |  4   | reserved
 32    |  *   | layer[0].weights (n_outputs × words_per_row × u32)
 ...

All multi-byte fields are little-endian. words_per_row = ⌈n_inputs/32⌉.


2. Theory (why these particular mechanics)

2.1 Why bit-exact with Rust

The Rust crate (tinysc_riscv) is what actually runs on the MCU; the Python module is the design-space exploration surface. Any divergence between the two introduces silent drift — a network simulated in Python that no longer matches its MCU deployment. We therefore keep the same integer arithmetic, the same LFSR polynomial, the same Q16.16 encoding, the same byte layout. tests/test_edge/test_parity.py asserts byte-identity on 1024-bit LFSR streams, Izhikevich spike trains, and weight blobs.

2.2 Why integer arithmetic everywhere

The target board family (Table in §4) includes GD32VF103 (32 kB RAM, no FPU) and ESP32-C3 (400 kB RAM, no FPU). Running an SC network there means:

  • No float anywhere — membrane, thresholds, weights, plasticity rates are all integers.
  • No heap allocation — weight blobs are loaded zero-copy from flash.
  • Popcount is the only inner kernel that must be fast; the RISC-V cpop extension is a single-cycle instruction, and the WWG fallback is ~6 cycles on cores without it.

2.3 Why both LFSR and Sobol

LFSR is the default because it is the smallest stream generator that exists (16 FFs + 4 XOR gates) and perfectly matches the Rust core_engine::bitstream::Lfsr16. Sobol is the precision option: for a target bit probability $p$, Sobol converges to $\hat{p} = p \pm O(\log N / N)$ vs LFSR's $\pm O(1/\sqrt{N})$. On a 1024-bit stream at $p=0.5$, LFSR's 1-σ error is ≈0.0156; Sobol's is ≈0.002. Sobol costs more per sample (one XOR + one trailing-zero count) but lets a bandwidth-constrained fabric cut $L$ by 4× for the same MSE.

2.4 Why SCLayer does OR-combine, not concatenation

The most common pitfall when cascading SC layers is to simply concatenate the upstream spike-stream words. This destroys the probability interpretation (the combined stream becomes biased by word order). :meth:SCNetwork._flatten_bitstreams instead uses the SC saturating-add identity $A \vee B$, which preserves the probability semantics up to a correction term $p_{A}p_{B}$ that is small when individual inputs have low density.

2.5 Why cascading re-encodes through LFSR

Between layers, the boolean spike vector from layer $L_{k}$ is re-encoded to a new bitstream by :class:Lfsr16. This deliberate re-randomisation prevents correlation buildup: a spike from layer $k$ that fires twice in a row would otherwise produce a perfectly correlated stream that cannot be multiplied downstream (SCC(A, A) = 1 gives A ∧ A = A, not ). Re-encoding restores independence at the cost of one LFSR-cycle per spike — free on any core with a hardware XOR tree.

2.6 Why the AER router is Go, not Python

UDP mesh routing at AER rates (~Mpkts/s peak) needs concurrency and zero GC pauses. Go's goroutine model handles the fan-out per channel without the GIL bottleneck that Python would have. The Python :class:AERRoutingDaemon is a supervisor: go build on demand, spawn, SIGTERM on teardown.


3. Position in the pipeline

Text Only
        ┌──────────────────────────────────────────────┐
        │                Python design                  │
        │  (train / evolve a network, export weights)   │
        └──────────────────┬───────────────────────────┘
                           │
                    serialize_weights()
                           │
                           ▼
                    ┌─────────────┐
                    │ .bin blob   │
                    │ (SCWL magic)│
                    └──────┬──────┘
                           │  zero-copy load on boot
                           ▼
       ┌────────────────────────────────────┐
       │         RISC-V MCU target           │
       │  ┌──────────────────────────────┐  │
       │  │  sc_neurocore::edge (Rust)   │  │
       │  │  Lfsr16 → SCLayer.forward    │  │
       │  │    → LifNeuron / Izhikevich  │  │
       │  │    → TelemetryRing           │  │
       │  └──────────────────────────────┘  │
       └─────────────────┬──────────────────┘
                         │ AER events (UDP)
                         ▼
                ┌─────────────────────┐
                │  AERRoutingDaemon   │  (Go)
                │     UDP mesh        │
                └─────────┬───────────┘
                          │
                          ▼
                 Downstream FPGA tiles

4. Supported boards

Board RAM Flash Active µW @160 MHz Sleep µW Comment
ESP32-C3 400 kB 4 MB 15 000 5 WROOM-02-class
ESP32-C6 512 kB 4 MB 18 000 7 adds 802.15.4
ESP32-H2 320 kB 4 MB 12 000 3 low-power variant
GD32VF103 32 kB 128 kB 8 000 10 smallest, cheapest
CH32V307 64 kB 256 kB 10 000 8 high I/O count
K210 8 MB 16 MB 300 000 50 dual-core + KPU
Generic 64 kB 256 kB 10 000 10 conservative fallback

:meth:PowerProfile.for_board linearly scales active_uw with clock frequency (reference 160 MHz). :meth:MemoryFootprint.estimate checks that a chosen network fits before deployment.


5. Features

  • 11 bitstream primitives (popcount + 5 SC ops + 3 packed variants + probability + SCC).
  • Bit-compatible LFSR-16 and Gray-code Sobol encoders.
  • Two SC neuron models (LIF, Izhikevich with 4 presets).
  • Multi-layer SCNetwork runner with per-layer cascading re-encode.
  • Zero-copy weight blob (SCWL magic, 16-byte headers, little-endian).
  • Runtime telemetry ring buffer (per-layer + per-device).
  • 7 pre-profiled RISC-V MCU targets + memory-footprint estimator.
  • Cargo / memory.x config generators (:func:generate_cargo_config, :func:generate_memory_x) for no_std RISC-V builds.
  • Go AER UDP mesh router with Python lifecycle supervisor.

6. Usage

6.1 Run a 2-layer SC network

Python
from sc_neurocore.edge import SCNetwork, SCLayer

net = SCNetwork(bit_length=1024)
net.add_layer(SCLayer(n_inputs=32, n_outputs=16))
net.add_layer(SCLayer(n_inputs=16, n_outputs=8))
spikes = net.run([0.5] * 32)    # bool[8]

6.2 Export + reload weights

Python
from sc_neurocore.edge import serialize_weights, deserialize_weights
blob = serialize_weights(net.export_weights())
open("weights.bin", "wb").write(blob)

# On the MCU side, Rust loads it zero-copy; in Python we round-trip:
layers = deserialize_weights(open("weights.bin", "rb").read())
net2 = SCNetwork.from_weights(layers, bit_length=1024)

Expected blob size for a 32→16→8 network: header (16 B) + 2 × layer header (16 B) + weight words ($16 \cdot 1 \cdot 4 + 8 \cdot 1 \cdot 4 = 96$ B) = 144 B.

6.3 Power budget

Python
from sc_neurocore.edge import Board, PowerProfile, MemoryFootprint

prof = PowerProfile.for_board(Board.ESP32_C3, clock_mhz=80)
print(prof.duty_cycled_uw(duty=0.2))  # µW at 20 % active

fp = MemoryFootprint.estimate(
    num_layers=2, neurons_per_layer=64, bs_words=32, board=Board.GD32VF103
)
print(fp.fits_in_ram, fp.stack_bytes)

6.4 FPGA report-derived power/thermal JSON

Deployment bundles can emit a pre-silicon model from architecture settings, or a report-derived model once Vivado has produced routed reports. The report-derived path records the Vivado headline power, static/dynamic split, effective TJA, junction temperature, and implementation resource counts while preserving the SC workload metadata used by the estimator.

Python
from sc_neurocore.edge import (
    PowerThermalConfig,
    write_power_thermal_model_from_vivado_reports,
)

write_power_thermal_model_from_vivado_reports(
    "sc_shd_pynq/sc_shd_pynq.runs/impl_1",
    "sc_shd_pynq/deployable_artifacts",
    PowerThermalConfig(
        target="zynq",
        layer_sizes=((700, 128), (128, 128), (128, 20)),
        bitstream_length=256,
        clock_mhz=100.0,
    ),
)

The emitted JSON uses source_mode = "vivado_report_derived". It is still not a substitute for physical PYNQ board measurement; it is the reproducible bridge between routed Vivado reports and the deployment artefact directory.

6.5 Launch the AER router

Python
from sc_neurocore.edge.aer_router import AERRoutingDaemon
router = AERRoutingDaemon(port=9000)
router.start(build=True)
# ... experiment drives UDP events to localhost:9000 ...
router.stop()

7. Verified benchmarks

Measured on Ubuntu 24.04 / CPython 3.12.3 / Intel i5-11600K @ 3.90 GHz, single-thread, 2026-04-20. Committed script: benchmarks/bench_edge.py. Raw JSON at benchmarks/results/bench_edge.json.

Operation Throughput Latency
popcount32 (pure-Py) 3.30 M ops/s 303.0 ns
popcount_slice (1024 words) 3 545 ops/s 282.1 µs
Lfsr16.encode (1024-bit) 3 680 ops/s 271.8 µs
SobolGenerator.encode (1024-bit) 916 ops/s 1 092.3 µs
LifNeuron.tick (32-word input) 99 909 ops/s 10.0 µs
IzhikevichNeuron.tick 2.21 M ops/s 453.2 ns
SCNetwork.run (32→16→8 @ 1024 bits) 65 runs/s 15.35 ms
serialize_weights (2-layer, 144 B blob) 199 047 ops/s 5.02 µs
deserialize_weights (2-layer) 126 270 ops/s 7.92 µs
scc (32 words = 1024 bits) 36 076 ops/s 27.7 µs

Figures above are time.perf_counter deltas from benchmarks/bench_edge.py.

Interpretation.

  • popcount32 at 319 ns is slow relative to the Rust cpop instruction (~1 ns at 1 GHz) but the Python version is only used for R&D — on the MCU the Rust path is what runs.
  • Lfsr16.encode vs SobolGenerator.encode on a 1024-bit stream: LFSR is ~4× faster per bitstream because the Sobol step needs (idx & -idx).bit_length() (trailing-zero count) per sample. Sobol pays that cost for better precision; the rule of thumb is to use Sobol only when $L \leq 256$ and precision is critical.
  • SCNetwork.run at 14.34 ms for 32→16→8 is dominated by per-input LFSR re-encoding; a NumPy-vectorised version is available in sc_neurocore.v3.engine but is not bit-compatible with the MCU target and so is excluded from this module.
  • IzhikevichNeuron.tick is 21× faster than LifNeuron.tick because the LIF tick runs a full popcount over 32 words per step, whereas Izhikevich just does integer MACs on two 32-bit state variables.

8. Citations

  1. Gaines B.R. (1967). Stochastic computing systems. In Advances in Information Systems Science, vol. 2, Plenum, 37–172.
  2. Alaghi A., Hayes J.P. (2013). Exploiting correlation in stochastic circuit design. ICCD-2013, 39–46. (SCC definition.)
  3. Wilkes M.V., Wheeler D.J., Gill S. (1951). The Preparation of Programs for an Electronic Digital Computer. Addison-Wesley. (WWG popcount.)
  4. Izhikevich E.M. (2003). Simple model of spiking neurons. IEEE TNN 14(6):1569–1572. (Two-variable Q16.16 model.)
  5. Joe S., Kuo F.Y. (2008). Constructing Sobol' sequences with better two-dimensional projections. SIAM J. Sci. Comput. 30:2635–2654.
  6. RISC-V Foundation (2021). RISC-V Bit-Manipulation Extension v1.0. (cpop / cpopw popcount instructions.)
  7. Šotek M. (2026). SC-NeuroCore: bare-metal SC runtime port. Internal report, ANULUM.

9. Known limitations

  • Python runner is not the hot path. SCNetwork.run in Python costs ~14 ms per forward; the MCU Rust runner completes the same network in <200 µs on a 160 MHz ESP32-C3. Use the Python path for verification and weight-blob authoring, not for inference timing.
  • Single-dimension Sobol. Only dimension 1 (V_k = 2^{15-k}) is wired; decorrelating N > 1 input streams still needs a phase-shifted LFSR bank (see sc_neurocore.v3.engine).
  • No 64-bit packing on the Python side. LFSR packs to u32; SobolGenerator packs to u64 — the two encoders are therefore not drop-in replacements in a u32-word pipeline. Pick one per layer.
  • SCLayer.forward has no bias term. Dense layers are weight-only; the bias must be folded into the threshold at quantisation time.
  • Integer Izhikevich is a best-effort Q16.16 discretisation. The quadratic term v*v >> 14 rounds differently from a floating-point reference; results match the published spike trains qualitatively (regular / fast / chattering / burst) but not bit-for-bit against a double-precision Izhikevich simulator.
  • AER router is IPv4 UDP only. IPv6 and unix-domain sockets are not supported; on a Linux host with net.ipv6.bindv6only=1 the router will fail to bind without explicit IPv4 address.
  • No RTT / loss bookkeeping in the Python supervisor. Dropped AER packets are visible on the Go side (in aer_router logs) but the Python daemon does not surface the metric.

10. Rust parity — what the MCU runs

The Rust crate tinysc_riscv under crates/tinysc_riscv/ is the physical MCU counterpart to every module on this page. The parity rule is byte-exact for the four externally visible surfaces:

Surface Python type / function Rust counterpart
Packed bitstream popcount32, sc_and, … bitstream::popcount32, bitstream::sc_and
LFSR stream Lfsr16 bitstream::Lfsr16
Q16.16 Izhikevich IzhikevichNeuron + 4 presets neuron::IzhikevichNeuron + 4 presets
Weight blob serialize_weights / deserialize_weights weights::load_zero_copy

The parity tests live in tests/test_edge/test_parity.py and assert word-for-word equality on 1024-bit LFSR streams seeded at 0xACE1, and byte-for-byte equality on a 2-layer SCWL blob. Any change to the Python representation that would break parity must include a matching Rust change in the same PR.

The Python side additionally provides three surfaces with no Rust counterpart because they are R&D-only: :class:SobolGenerator (precision experiments), :class:PowerProfile / :class:Board (pre-deployment footprint estimation), and :func:generate_cargo_config / :func:generate_memory_x (cross-compilation aids). These never ship to the MCU.


11. Reproducibility

Every number in §7 is reproducible from a clean checkout by running

Bash
python benchmarks/bench_edge.py

which writes benchmarks/results/bench_edge.json alongside the stdout table. Randomness is deterministic: Lfsr16(0xACE1) and SobolGenerator(0) have no hidden entropy, and SCNetwork.run initialises a fresh LFSR per call.

Variance between runs on the same host is dominated by scheduler jitter and CPU-cache state; on the CEO workstation the inner hot paths (popcount32, Izhikevich tick, weights serialise) stay within ±5 % of the numbers in §7 across repeated runs. SCNetwork.run can drift up to ±15 % because its 15 ms per-forward budget is dominated by Python-level allocation inside _spikes_to_bitstreams; pin the benchmark to a single core with taskset -c 0 python benchmarks/bench_edge.py for lower variance if you need to track regressions.


12. Embedding hook — telemetry ring

:class:TelemetryRing is a fixed-capacity lock-protected ring buffer that stores u32 samples (Python uses an int list internally; the Rust counterpart uses [u32; N] on the MCU). The ring is threading.Lock-guarded on the Python side for multi-thread writers; the Rust side uses a single-producer assumption and needs no lock.

API:

  • :meth:TelemetryRing.push(value) — overwrite-on-full append.
  • :meth:TelemetryRing.mean() — arithmetic mean of the current window.
  • :meth:TelemetryRing.last() — most recent value.
  • :attr:TelemetryRing.count — number of live entries (<= capacity).
  • :attr:TelemetryRing.capacity — construction-time fixed capacity (default 256).

:class:LayerTelemetry wraps two rings — spike_rate_ring and utilization_ring (both capacity 64 by default) — and records per- tick activity via :meth:LayerTelemetry.record_tick(n_spikes, n_neurons). :class:DeviceTelemetry aggregates a dict of :class:LayerTelemetry and exposes :meth:DeviceTelemetry.summary returning a JSON-ready dict of per-layer (spike_count, tick_count, mean_spike_rate, mean_utilization) plus device-level (total_ticks, total_spikes, error_count).

On the MCU side the rings are the same shape and the same field semantics; the Rust telemetry::DeviceTelemetry::summary() returns the same dict (serialised as JSON to the UART/WebSocket transport), so the HIL debugger's frame schema is invariant across target. A shared- memory fast path between Rust and Go is not yet wired — today the data travels as JSON over UART or WebSocket.


Reference

  • Sources (package root):
  • src/sc_neurocore/edge/__init__.py (74 LOC, re-export surface)
  • src/sc_neurocore/edge/bitstream.py (109 LOC)
  • src/sc_neurocore/edge/lfsr.py (66 LOC)
  • src/sc_neurocore/edge/sobol.py (91 LOC)
  • src/sc_neurocore/edge/neuron.py (107 LOC, LIF + Izhikevich)
  • src/sc_neurocore/edge/sc_network.py (154 LOC)
  • src/sc_neurocore/edge/weights.py (129 LOC, SCWL format)
  • src/sc_neurocore/edge/telemetry.py (133 LOC)
  • src/sc_neurocore/edge/power_estimator.py (113 LOC)
  • src/sc_neurocore/edge/deploy.py (63 LOC)
  • src/sc_neurocore/edge/aer_router.py (48 LOC, Go supervisor)
  • Go daemon: src/sc_neurocore/accel/go/services/aer_router/main.go + main_test.go.
  • Benchmark: benchmarks/bench_edge.py.
  • Parity tests vs Rust tinysc_riscv: tests/test_edge/*.py.

sc_neurocore.edge.aer_router

AERRoutingDaemon

Orchestrates the Go-based AER UDP mesh multi-FPGA router pipeline dynamically.

Source code in src/sc_neurocore/edge/aer_router.py
Python
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class AERRoutingDaemon:
    """Orchestrates the Go-based AER UDP mesh multi-FPGA router pipeline dynamically."""

    def __init__(self, port: int = 9000):
        self._router_dir = (
            Path(__file__).resolve().parent.parent / "accel" / "go" / "services" / "aer_router"
        )
        self._port = port
        self._process: subprocess.Popen[bytes] | None = None

    def start(self, build: bool = True) -> None:
        if build:
            print("[AER Router] Natively compiling robust Go pipeline...")
            subprocess.run(
                ["go", "build", "-o", "aer_router", "main.go"],
                cwd=str(self._router_dir),
                check=True,
            )

        print(f"[AER Router] Spawning background listener on port {self._port}...")
        self._process = subprocess.Popen(
            ["./aer_router"],
            cwd=str(self._router_dir),
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL,
        )
        time.sleep(0.5)

    def stop(self) -> None:
        """Tears down the active background UDP topology safely."""
        if self._process is not None:
            self._process.terminate()
            self._process.wait(timeout=2.0)
            self._process = None
            print("[AER Router] Daemon successfully shut down.")

stop()

Tears down the active background UDP topology safely.

Source code in src/sc_neurocore/edge/aer_router.py
Python
42
43
44
45
46
47
48
def stop(self) -> None:
    """Tears down the active background UDP topology safely."""
    if self._process is not None:
        self._process.terminate()
        self._process.wait(timeout=2.0)
        self._process = None
        print("[AER Router] Daemon successfully shut down.")

sc_neurocore.edge.bitstream

Packed u32-word bitstream operations for SC arithmetic.

All operations work on lists of unsigned 32-bit integers, mirroring the bare-metal Rust implementation for RISC-V targets. Provides popcount, SC AND/OR/XOR/MUX/SUB, SCC computation, and probability estimation.

popcount32(word)

Count set bits in a u32 word (Wilkes-Wheeler-Gill).

Source code in src/sc_neurocore/edge/bitstream.py
Python
22
23
24
25
26
27
28
29
30
def popcount32(word: int) -> int:
    """Count set bits in a u32 word (Wilkes-Wheeler-Gill)."""
    x = word & MASK32
    x = x - ((x >> 1) & 0x5555_5555)
    x = (x & 0x3333_3333) + ((x >> 2) & 0x3333_3333)
    x = (x + (x >> 4)) & 0x0F0F_0F0F
    x = x + (x >> 8)
    x = x + (x >> 16)
    return x & 0x3F

popcount_slice(words)

Popcount over a packed u32 word slice.

Source code in src/sc_neurocore/edge/bitstream.py
Python
33
34
35
36
37
38
def popcount_slice(words: list[int]) -> int:
    """Popcount over a packed u32 word slice."""
    total = 0
    for w in words:
        total += popcount32(w)
    return total

sc_and(a, b)

SC multiply (bitwise AND).

Source code in src/sc_neurocore/edge/bitstream.py
Python
41
42
43
def sc_and(a: int, b: int) -> int:
    """SC multiply (bitwise AND)."""
    return (a & b) & MASK32

sc_or(a, b)

SC saturating addition (bitwise OR).

Source code in src/sc_neurocore/edge/bitstream.py
Python
46
47
48
def sc_or(a: int, b: int) -> int:
    """SC saturating addition (bitwise OR)."""
    return (a | b) & MASK32

sc_xor(a, b)

SC absolute difference / HDC bind (bitwise XOR).

Source code in src/sc_neurocore/edge/bitstream.py
Python
51
52
53
def sc_xor(a: int, b: int) -> int:
    """SC absolute difference / HDC bind (bitwise XOR)."""
    return (a ^ b) & MASK32

sc_sub(a, b)

SC saturating subtraction: a AND NOT b.

Source code in src/sc_neurocore/edge/bitstream.py
Python
56
57
58
def sc_sub(a: int, b: int) -> int:
    """SC saturating subtraction: a AND NOT b."""
    return (a & (~b & MASK32)) & MASK32

sc_mux(a, b, sel)

SC scaled addition (2:1 MUX): (a AND sel) OR (b AND NOT sel).

Source code in src/sc_neurocore/edge/bitstream.py
Python
61
62
63
def sc_mux(a: int, b: int, sel: int) -> int:
    """SC scaled addition (2:1 MUX): (a AND sel) OR (b AND NOT sel)."""
    return ((a & sel) | (b & (~sel & MASK32))) & MASK32

and_packed(a, b)

SC AND over two packed word slices.

Source code in src/sc_neurocore/edge/bitstream.py
Python
66
67
68
69
def and_packed(a: list[int], b: list[int]) -> list[int]:
    """SC AND over two packed word slices."""
    assert len(a) == len(b)
    return [(x & y) & MASK32 for x, y in zip(a, b)]

mux_packed(a, b, sel)

SC MUX over two packed word slices with a select bitstream.

Source code in src/sc_neurocore/edge/bitstream.py
Python
72
73
74
75
def mux_packed(a: list[int], b: list[int], sel: list[int]) -> list[int]:
    """SC MUX over two packed word slices with a select bitstream."""
    assert len(a) == len(b) == len(sel)
    return [((x & s) | (y & (~s & MASK32))) & MASK32 for x, y, s in zip(a, b, sel)]

probability(words, bit_length)

Estimated probability from a packed bitstream.

Source code in src/sc_neurocore/edge/bitstream.py
Python
78
79
80
81
82
def probability(words: list[int], bit_length: int) -> float:
    """Estimated probability from a packed bitstream."""
    if bit_length == 0:
        return 0.0
    return popcount_slice(words) / bit_length

scc(a, b, bit_length)

SCC between two packed u32 bitstreams (Alaghi & Hayes, 2013).

Returns a correlation coefficient in [-1, 1].

Source code in src/sc_neurocore/edge/bitstream.py
Python
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def scc(a: list[int], b: list[int], bit_length: int) -> float:
    """SCC between two packed u32 bitstreams (Alaghi & Hayes, 2013).

    Returns a correlation coefficient in [-1, 1].
    """
    assert len(a) == len(b)
    if bit_length == 0:
        return 0.0
    n = float(bit_length)
    pa = popcount_slice(a) / n
    pb = popcount_slice(b) / n

    and_count = sum(popcount32(x & y) for x, y in zip(a, b))
    p_and = and_count / n

    num = p_and - (pa * pb)
    if abs(num) < 1e-7:
        return 0.0
    if num > 0.0:
        denom = min(pa, pb) - (pa * pb)
    else:
        denom = (pa * pb) - max(pa + pb - 1.0, 0.0)
    if abs(denom) < 1e-7:
        return 0.0
    return max(-1.0, min(1.0, num / denom))

sc_neurocore.edge.lfsr

Deterministic LFSR-16 encoder bit-compatible with core_engine::Lfsr16.

Polynomial: x^16 + x^14 + x^13 + x^11 + 1 (maximal length = 65535). Generates packed u32-word bitstreams from probability thresholds.

Lfsr16

16-bit Galois LFSR bitstream encoder.

Bit-compatible with the Rust core_engine::bitstream::Lfsr16. Uses u32-packed output for MCU word alignment.

Source code in src/sc_neurocore/edge/lfsr.py
Python
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class Lfsr16:
    """16-bit Galois LFSR bitstream encoder.

    Bit-compatible with the Rust core_engine::bitstream::Lfsr16.
    Uses u32-packed output for MCU word alignment.
    """

    TAPS = 0xD008  # x^16+x^14+x^13+x^11+1

    def __init__(self, seed: int = 0xACE1):
        self.reg = seed & 0xFFFF
        if self.reg == 0:
            self.reg = 0xACE1

    def step(self) -> int:
        """Advance LFSR by one clock, return new state."""
        bit = ((self.reg >> 0) ^ (self.reg >> 2) ^ (self.reg >> 3) ^ (self.reg >> 5)) & 1
        self.reg = ((self.reg >> 1) | (bit << 15)) & 0xFFFF
        return self.reg

    def encode(self, threshold: int, bit_length: int) -> list[int]:
        """Encode probability (threshold/65535) into packed u32 words.

        Parameters
        ----------
        threshold : int
            Comparison threshold [0, 65535]. Higher = more 1-bits.
        bit_length : int
            Number of bits in the output bitstream.

        Returns
        -------
        list[int]
            Packed u32 words representing the bitstream.
        """
        n_words = (bit_length + 31) // 32
        out = [0] * n_words
        for i in range(bit_length):
            val = self.step()
            if val < threshold:
                out[i // 32] |= 1 << (i % 32)
        return [w & MASK32 for w in out]

    def encode_float(self, p: float, bit_length: int) -> list[int]:
        """Encode a probability [0.0, 1.0] into a packed bitstream."""
        threshold = int(p * 65535)
        return self.encode(threshold, bit_length)

step()

Advance LFSR by one clock, return new state.

Source code in src/sc_neurocore/edge/lfsr.py
Python
34
35
36
37
38
def step(self) -> int:
    """Advance LFSR by one clock, return new state."""
    bit = ((self.reg >> 0) ^ (self.reg >> 2) ^ (self.reg >> 3) ^ (self.reg >> 5)) & 1
    self.reg = ((self.reg >> 1) | (bit << 15)) & 0xFFFF
    return self.reg

encode(threshold, bit_length)

Encode probability (threshold/65535) into packed u32 words.

Parameters

threshold : int Comparison threshold [0, 65535]. Higher = more 1-bits. bit_length : int Number of bits in the output bitstream.

Returns

list[int] Packed u32 words representing the bitstream.

Source code in src/sc_neurocore/edge/lfsr.py
Python
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def encode(self, threshold: int, bit_length: int) -> list[int]:
    """Encode probability (threshold/65535) into packed u32 words.

    Parameters
    ----------
    threshold : int
        Comparison threshold [0, 65535]. Higher = more 1-bits.
    bit_length : int
        Number of bits in the output bitstream.

    Returns
    -------
    list[int]
        Packed u32 words representing the bitstream.
    """
    n_words = (bit_length + 31) // 32
    out = [0] * n_words
    for i in range(bit_length):
        val = self.step()
        if val < threshold:
            out[i // 32] |= 1 << (i % 32)
    return [w & MASK32 for w in out]

encode_float(p, bit_length)

Encode a probability [0.0, 1.0] into a packed bitstream.

Source code in src/sc_neurocore/edge/lfsr.py
Python
63
64
65
66
def encode_float(self, p: float, bit_length: int) -> list[int]:
    """Encode a probability [0.0, 1.0] into a packed bitstream."""
    threshold = int(p * 65535)
    return self.encode(threshold, bit_length)

sc_neurocore.edge.sobol

Sobol low-discrepancy sequence generator for SC bitstream decorrelation.

Provides better uniformity than LFSR-16 at the cost of slightly more compute per step. Uses Gray-code acceleration for O(1) per-sample generation (no matrix multiply needed).

SobolGenerator

1D Sobol sequence generator with 16-bit resolution.

Uses Joe-Kuo direction numbers (dimension 1) and Gray-code indexing so only one XOR per step.

Source code in src/sc_neurocore/edge/sobol.py
Python
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
class SobolGenerator:
    """1D Sobol sequence generator with 16-bit resolution.

    Uses Joe-Kuo direction numbers (dimension 1) and Gray-code indexing
    so only one XOR per step.
    """

    DIRECTION_NUMBERS = np.array(
        [
            0x8000,
            0x4000,
            0x2000,
            0x1000,
            0x0800,
            0x0400,
            0x0200,
            0x0100,
            0x0080,
            0x0040,
            0x0020,
            0x0010,
            0x0008,
            0x0004,
            0x0002,
            0x0001,
        ],
        dtype=np.uint16,
    )

    def __init__(self, seed: int = 0):
        self._reg = np.uint16(seed)
        self._index = np.uint32(0)

    def step(self) -> int:
        """Advance by one step, return the next Sobol value in [0, 65535]."""
        c = 0
        idx = int(self._index)
        if idx > 0:
            c = (idx & -idx).bit_length() - 1
        if c < 16:
            self._reg ^= self.DIRECTION_NUMBERS[c]
        self._index += np.uint32(1)
        return int(self._reg)

    def encode(self, threshold: int, length: int) -> np.ndarray:
        """Encode a probability into packed u64 words using Sobol sequence.

        Parameters
        ----------
        threshold : int
            Value in [0, 65535]. Each Sobol sample < threshold becomes a 1-bit.
        length : int
            Number of bits in the bitstream.

        Returns
        -------
        np.ndarray
            Packed u64 bitstream array.
        """
        n_words = (length + 63) // 64
        out = np.zeros(n_words, dtype=np.uint64)
        for i in range(length):
            val = self.step()
            if val < threshold:
                out[i // 64] |= np.uint64(1) << np.uint64(i % 64)
        return out

    def reset(self, seed: int = 0) -> None:
        """Reset to initial state."""
        self._reg = np.uint16(seed)
        self._index = np.uint32(0)

step()

Advance by one step, return the next Sobol value in [0, 65535].

Source code in src/sc_neurocore/edge/sobol.py
Python
54
55
56
57
58
59
60
61
62
63
def step(self) -> int:
    """Advance by one step, return the next Sobol value in [0, 65535]."""
    c = 0
    idx = int(self._index)
    if idx > 0:
        c = (idx & -idx).bit_length() - 1
    if c < 16:
        self._reg ^= self.DIRECTION_NUMBERS[c]
    self._index += np.uint32(1)
    return int(self._reg)

encode(threshold, length)

Encode a probability into packed u64 words using Sobol sequence.

Parameters

threshold : int Value in [0, 65535]. Each Sobol sample < threshold becomes a 1-bit. length : int Number of bits in the bitstream.

Returns

np.ndarray Packed u64 bitstream array.

Source code in src/sc_neurocore/edge/sobol.py
Python
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def encode(self, threshold: int, length: int) -> np.ndarray:
    """Encode a probability into packed u64 words using Sobol sequence.

    Parameters
    ----------
    threshold : int
        Value in [0, 65535]. Each Sobol sample < threshold becomes a 1-bit.
    length : int
        Number of bits in the bitstream.

    Returns
    -------
    np.ndarray
        Packed u64 bitstream array.
    """
    n_words = (length + 63) // 64
    out = np.zeros(n_words, dtype=np.uint64)
    for i in range(length):
        val = self.step()
        if val < threshold:
            out[i // 64] |= np.uint64(1) << np.uint64(i % 64)
    return out

reset(seed=0)

Reset to initial state.

Source code in src/sc_neurocore/edge/sobol.py
Python
88
89
90
91
def reset(self, seed: int = 0) -> None:
    """Reset to initial state."""
    self._reg = np.uint16(seed)
    self._index = np.uint32(0)

sc_neurocore.edge.neuron

LIF and Izhikevich spiking neurons operating in the SC domain.

Membrane potential is tracked as a popcount accumulator (integer, no FPU). This mirrors the bare-metal implementation for RISC-V targets where floating-point is unavailable or expensive.

LifNeuron dataclass

Leaky Integrate-and-Fire neuron (SC domain, integer arithmetic).

Membrane potential = running popcount of input bitstream. Leak = right-shift per tick (exponential decay). Fires when potential exceeds threshold.

Source code in src/sc_neurocore/edge/neuron.py
Python
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@dataclass
class LifNeuron:
    """Leaky Integrate-and-Fire neuron (SC domain, integer arithmetic).

    Membrane potential = running popcount of input bitstream.
    Leak = right-shift per tick (exponential decay).
    Fires when potential exceeds threshold.
    """

    threshold: int = 512
    leak_shift: int = 3
    membrane: int = 0
    spike_count: int = 0

    def tick(self, input_words: list[int]) -> bool:
        """Process one timestep, return True if spike fired."""
        excitation = popcount_slice(input_words)
        self.membrane += excitation
        self.membrane -= self.membrane >> self.leak_shift
        if self.membrane >= self.threshold:
            self.membrane = 0
            self.spike_count += 1
            return True
        return False

    def reset(self) -> None:
        self.membrane = 0
        self.spike_count = 0

tick(input_words)

Process one timestep, return True if spike fired.

Source code in src/sc_neurocore/edge/neuron.py
Python
37
38
39
40
41
42
43
44
45
46
def tick(self, input_words: list[int]) -> bool:
    """Process one timestep, return True if spike fired."""
    excitation = popcount_slice(input_words)
    self.membrane += excitation
    self.membrane -= self.membrane >> self.leak_shift
    if self.membrane >= self.threshold:
        self.membrane = 0
        self.spike_count += 1
        return True
    return False

IzhikevichNeuron dataclass

Izhikevich neuron with integer SC-domain dynamics.

Uses fixed-point arithmetic (Q16.16) to avoid floating-point. Supports regular spiking, fast spiking, chattering, and intrinsic burst.

Source code in src/sc_neurocore/edge/neuron.py
Python
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
@dataclass
class IzhikevichNeuron:
    """Izhikevich neuron with integer SC-domain dynamics.

    Uses fixed-point arithmetic (Q16.16) to avoid floating-point.
    Supports regular spiking, fast spiking, chattering, and intrinsic burst.
    """

    a_q16: int = 1311  # 0.02 in Q16.16
    b_q16: int = 13107  # 0.2 in Q16.16
    c_q16: int = -4259840  # -65.0 in Q16.16
    d_q16: int = 524288  # 8.0 in Q16.16
    v_q16: int = -4259840  # -65.0
    u_q16: int = -917504  # -14.0

    spike_count: int = 0
    _q16_one: int = field(default=65536, repr=False)

    def tick(self, input_current_q16: int) -> bool:
        """Process one timestep. Returns True on spike."""
        v = self.v_q16
        u = self.u_q16

        dv = ((v * v) >> 14) + ((5 * v) >> 0) + (140 << 16) - u + input_current_q16
        du = (self.a_q16 * ((self.b_q16 * v >> 16) - u)) >> 16
        self.v_q16 = v + (dv >> 8)
        self.u_q16 = u + (du >> 8)

        if self.v_q16 >= (30 << 16):
            self.v_q16 = self.c_q16
            self.u_q16 += self.d_q16
            self.spike_count += 1
            return True
        return False

    def reset(self) -> None:
        self.v_q16 = self.c_q16
        self.u_q16 = -917504
        self.spike_count = 0

    @classmethod
    def regular_spiking(cls) -> IzhikevichNeuron:
        return cls(a_q16=1311, b_q16=13107, c_q16=-4259840, d_q16=524288)

    @classmethod
    def fast_spiking(cls) -> IzhikevichNeuron:
        return cls(a_q16=6554, b_q16=13107, c_q16=-4259840, d_q16=131072)

    @classmethod
    def chattering(cls) -> IzhikevichNeuron:
        return cls(a_q16=1311, b_q16=13107, c_q16=-3276800, d_q16=131072)

    @classmethod
    def intrinsic_burst(cls) -> IzhikevichNeuron:
        return cls(a_q16=1311, b_q16=13107, c_q16=-3604480, d_q16=262144)

tick(input_current_q16)

Process one timestep. Returns True on spike.

Source code in src/sc_neurocore/edge/neuron.py
Python
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def tick(self, input_current_q16: int) -> bool:
    """Process one timestep. Returns True on spike."""
    v = self.v_q16
    u = self.u_q16

    dv = ((v * v) >> 14) + ((5 * v) >> 0) + (140 << 16) - u + input_current_q16
    du = (self.a_q16 * ((self.b_q16 * v >> 16) - u)) >> 16
    self.v_q16 = v + (dv >> 8)
    self.u_q16 = u + (du >> 8)

    if self.v_q16 >= (30 << 16):
        self.v_q16 = self.c_q16
        self.u_q16 += self.d_q16
        self.spike_count += 1
        return True
    return False

sc_neurocore.edge.sc_network

Fixed-capacity feed-forward SC network runner.

Mirrors the bare-metal Rust implementation, providing a stack-like execution model: encode inputs → layer-by-layer SC inference → decode outputs.

SCLayer dataclass

Single dense SC layer: weights × inputs via AND + popcount threshold.

Source code in src/sc_neurocore/edge/sc_network.py
Python
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
@dataclass
class SCLayer:
    """Single dense SC layer: weights × inputs via AND + popcount threshold."""

    n_inputs: int
    n_outputs: int
    threshold: int = 512
    weights: list[list[int]] = field(default_factory=list)
    sc_mode: SCMode = "unipolar"

    def __post_init__(self) -> None:
        self._validate_configuration()
        if not self.weights:
            self.weights = [
                [0x5555_5555] * ((self.n_inputs + 31) // 32) for _ in range(self.n_outputs)
            ]
        self._validate_weights()

    def _validate_configuration(self) -> None:
        if self.sc_mode != "unipolar":
            raise ValueError("SCLayer currently supports only sc_mode='unipolar'")
        if self.n_inputs <= 0:
            raise ValueError("n_inputs must be positive")
        if self.n_outputs <= 0:
            raise ValueError("n_outputs must be positive")
        if self.threshold < 0:
            raise ValueError("threshold must be non-negative")
        if self.n_inputs > MAX_NEURONS_PER_LAYER:
            raise ValueError(f"n_inputs must be <= {MAX_NEURONS_PER_LAYER}")
        if self.n_outputs > MAX_NEURONS_PER_LAYER:
            raise ValueError(f"n_outputs must be <= {MAX_NEURONS_PER_LAYER}")

    def _validate_weights(self) -> None:
        if len(self.weights) != self.n_outputs:
            raise ValueError("weights must contain one row per output")
        words_per_input = self.words_per_input
        for row in self.weights:
            if len(row) != words_per_input:
                raise ValueError("each weight row must match words_per_input")
            if any((word < 0 or word > MASK32) for word in row):
                raise ValueError("weight words must be unsigned 32-bit values")

    @property
    def words_per_input(self) -> int:
        return (self.n_inputs + 31) // 32

    def forward(self, input_words: list[int], bit_length: int) -> list[bool]:
        """Run SC inference: AND each weight row with input, threshold popcount."""
        if bit_length <= 0:
            raise ValueError("bit_length must be positive")
        if len(input_words) < self.words_per_input:
            raise ValueError("input_words length must be at least words_per_input")
        if any((word < 0 or word > MASK32) for word in input_words):
            raise ValueError("input words must be unsigned 32-bit values")
        spikes = []
        for row in self.weights:
            acc = 0
            for w, inp in zip(row, input_words):
                acc += popcount_slice([w & inp])
            spikes.append(acc >= self.threshold)
        return spikes

forward(input_words, bit_length)

Run SC inference: AND each weight row with input, threshold popcount.

Source code in src/sc_neurocore/edge/sc_network.py
Python
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def forward(self, input_words: list[int], bit_length: int) -> list[bool]:
    """Run SC inference: AND each weight row with input, threshold popcount."""
    if bit_length <= 0:
        raise ValueError("bit_length must be positive")
    if len(input_words) < self.words_per_input:
        raise ValueError("input_words length must be at least words_per_input")
    if any((word < 0 or word > MASK32) for word in input_words):
        raise ValueError("input words must be unsigned 32-bit values")
    spikes = []
    for row in self.weights:
        acc = 0
        for w, inp in zip(row, input_words):
            acc += popcount_slice([w & inp])
        spikes.append(acc >= self.threshold)
    return spikes

SCNetwork dataclass

Multi-layer feed-forward SC network runner.

Usage::

Text Only
net = SCNetwork(bit_length=1024)
net.add_layer(SCLayer(n_inputs=32, n_outputs=16))
net.add_layer(SCLayer(n_inputs=16, n_outputs=8))
output = net.run([0.5] * 32)
Source code in src/sc_neurocore/edge/sc_network.py
Python
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
@dataclass
class SCNetwork:
    """Multi-layer feed-forward SC network runner.

    Usage::

        net = SCNetwork(bit_length=1024)
        net.add_layer(SCLayer(n_inputs=32, n_outputs=16))
        net.add_layer(SCLayer(n_inputs=16, n_outputs=8))
        output = net.run([0.5] * 32)
    """

    bit_length: int = 1024
    layers: list[SCLayer] = field(default_factory=list)
    lfsr_seed: int = 0xACE1
    sc_mode: SCMode = "unipolar"

    def __post_init__(self) -> None:
        if self.sc_mode != "unipolar":
            raise ValueError("SCNetwork currently supports only sc_mode='unipolar'")
        if self.bit_length <= 0:
            raise ValueError("bit_length must be positive")

    def add_layer(self, layer: SCLayer) -> None:
        if layer.sc_mode != self.sc_mode:
            raise ValueError("layer sc_mode must match network sc_mode")
        self.layers.append(layer)

    def encode_inputs(self, probabilities: list[float]) -> list[list[int]]:
        """Encode float probabilities into per-input packed bitstreams."""
        if any((p < 0.0 or p > 1.0) for p in probabilities):
            raise ValueError("input probabilities must be in [0, 1]")
        lfsr = Lfsr16(self.lfsr_seed)
        return [lfsr.encode_float(p, self.bit_length) for p in probabilities]

    def _spikes_to_bitstreams(self, spikes: list[bool], lfsr: Lfsr16) -> list[list[int]]:
        """Re-encode spike booleans as bitstreams for the next layer."""
        return [lfsr.encode_float(1.0 if s else 0.0, self.bit_length) for s in spikes]

    def _flatten_bitstreams(self, streams: list[list[int]]) -> list[int]:
        """Interleave per-input bitstreams into a flat word array.

        For a layer expecting N inputs, the flat array has N×words_per_input
        entries: [input0_word0, input1_word0, ..., inputN_word0, input0_word1, ...]
        However for simplicity we concatenate: [input0_words..., input1_words..., ...].
        The layer forward reads the first words_per_input words as the combined input.
        To combine inputs, we OR them together (SC saturating addition).
        """
        if not streams:
            return []
        wpi = len(streams[0])
        combined = [0] * wpi
        for stream in streams:
            for j in range(wpi):
                combined[j] = (combined[j] | stream[j]) & MASK32
        return combined

    def run(self, input_probabilities: list[float]) -> list[bool]:
        """Full inference: encode → cascaded layer inference → spike output.

        Each layer's spike output is re-encoded as bitstreams and fed
        to the next layer. This is the correct SC cascade semantics.
        """
        if not self.layers:
            return []
        if len(input_probabilities) != self.layers[0].n_inputs:
            raise ValueError("input_probabilities length must match first layer n_inputs")

        lfsr = Lfsr16(self.lfsr_seed)
        input_streams = self.encode_inputs(input_probabilities)
        current_words = self._flatten_bitstreams(input_streams)

        current_spikes: list[bool] = []
        for layer in self.layers:
            current_spikes = layer.forward(current_words, self.bit_length)
            current_words = self._flatten_bitstreams(
                self._spikes_to_bitstreams(current_spikes, lfsr)
            )

        return current_spikes

    def export_weights(self) -> list[tuple[int, int, int, list[list[int]]]]:
        """Export all layer weights in serialization-ready format."""
        return [
            (layer.n_inputs, layer.n_outputs, layer.threshold, layer.weights)
            for layer in self.layers
        ]

    @classmethod
    def from_weights(
        cls,
        layers_data: list[tuple[Any, list[list[int]]]],
        bit_length: int = 1024,
        lfsr_seed: int = 0xACE1,
    ) -> SCNetwork:
        """Construct network from deserialized weight data."""
        net = cls(bit_length=bit_length, lfsr_seed=lfsr_seed)
        for lh, rows in layers_data:
            net.add_layer(
                SCLayer(
                    n_inputs=lh.n_inputs,
                    n_outputs=lh.n_outputs,
                    threshold=lh.threshold,
                    weights=rows,
                )
            )
        return net

    @property
    def layer_count(self) -> int:
        return len(self.layers)

    @property
    def total_neurons(self) -> int:
        return sum(layer.n_outputs for layer in self.layers)

encode_inputs(probabilities)

Encode float probabilities into per-input packed bitstreams.

Source code in src/sc_neurocore/edge/sc_network.py
Python
119
120
121
122
123
124
def encode_inputs(self, probabilities: list[float]) -> list[list[int]]:
    """Encode float probabilities into per-input packed bitstreams."""
    if any((p < 0.0 or p > 1.0) for p in probabilities):
        raise ValueError("input probabilities must be in [0, 1]")
    lfsr = Lfsr16(self.lfsr_seed)
    return [lfsr.encode_float(p, self.bit_length) for p in probabilities]

run(input_probabilities)

Full inference: encode → cascaded layer inference → spike output.

Each layer's spike output is re-encoded as bitstreams and fed to the next layer. This is the correct SC cascade semantics.

Source code in src/sc_neurocore/edge/sc_network.py
Python
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def run(self, input_probabilities: list[float]) -> list[bool]:
    """Full inference: encode → cascaded layer inference → spike output.

    Each layer's spike output is re-encoded as bitstreams and fed
    to the next layer. This is the correct SC cascade semantics.
    """
    if not self.layers:
        return []
    if len(input_probabilities) != self.layers[0].n_inputs:
        raise ValueError("input_probabilities length must match first layer n_inputs")

    lfsr = Lfsr16(self.lfsr_seed)
    input_streams = self.encode_inputs(input_probabilities)
    current_words = self._flatten_bitstreams(input_streams)

    current_spikes: list[bool] = []
    for layer in self.layers:
        current_spikes = layer.forward(current_words, self.bit_length)
        current_words = self._flatten_bitstreams(
            self._spikes_to_bitstreams(current_spikes, lfsr)
        )

    return current_spikes

export_weights()

Export all layer weights in serialization-ready format.

Source code in src/sc_neurocore/edge/sc_network.py
Python
172
173
174
175
176
177
def export_weights(self) -> list[tuple[int, int, int, list[list[int]]]]:
    """Export all layer weights in serialization-ready format."""
    return [
        (layer.n_inputs, layer.n_outputs, layer.threshold, layer.weights)
        for layer in self.layers
    ]

from_weights(layers_data, bit_length=1024, lfsr_seed=44257) classmethod

Construct network from deserialized weight data.

Source code in src/sc_neurocore/edge/sc_network.py
Python
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
@classmethod
def from_weights(
    cls,
    layers_data: list[tuple[Any, list[list[int]]]],
    bit_length: int = 1024,
    lfsr_seed: int = 0xACE1,
) -> SCNetwork:
    """Construct network from deserialized weight data."""
    net = cls(bit_length=bit_length, lfsr_seed=lfsr_seed)
    for lh, rows in layers_data:
        net.add_layer(
            SCLayer(
                n_inputs=lh.n_inputs,
                n_outputs=lh.n_outputs,
                threshold=lh.threshold,
                weights=rows,
            )
        )
    return net

sc_neurocore.edge.weights

Zero-copy weight loading for SC networks.

Binary format for pre-trained SC network weights that can be loaded from flash/disk without heap allocation. Compatible with the Rust bare-metal implementation.

Wire format (little-endian): Header fields: 4B magic 0x5343574C, 4B version, 4B n_layers, 4B flags. Per-layer fields: 4B n_inputs, 4B n_outputs, 4B threshold, 4B reserved, followed by n_outputs × n_words × 4B weight words.

WeightHeader dataclass

Weight blob header (16 bytes).

Source code in src/sc_neurocore/edge/weights.py
Python
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@dataclass
class WeightHeader:
    """Weight blob header (16 bytes)."""

    magic: int = WEIGHT_MAGIC
    version: int = WEIGHT_VERSION
    n_layers: int = 0
    flags: int = 0

    def to_bytes(self) -> bytes:
        return struct.pack("<IIII", self.magic, self.version, self.n_layers, self.flags)

    @classmethod
    def from_bytes(cls, data: bytes) -> WeightHeader:
        m, v, nl, f = struct.unpack("<IIII", data[:16])
        return cls(magic=m, version=v, n_layers=nl, flags=f)

    def validate(self) -> bool:
        return self.magic == WEIGHT_MAGIC and self.version <= WEIGHT_VERSION

LayerHeader dataclass

Per-layer header (16 bytes).

Source code in src/sc_neurocore/edge/weights.py
Python
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
@dataclass
class LayerHeader:
    """Per-layer header (16 bytes)."""

    n_inputs: int = 0
    n_outputs: int = 0
    threshold: int = 512
    reserved: int = 0

    def to_bytes(self) -> bytes:
        return struct.pack("<IIII", self.n_inputs, self.n_outputs, self.threshold, self.reserved)

    @classmethod
    def from_bytes(cls, data: bytes) -> LayerHeader:
        ni, no, th, r = struct.unpack("<IIII", data[:16])
        return cls(n_inputs=ni, n_outputs=no, threshold=th, reserved=r)

    @property
    def words_per_row(self) -> int:
        return (self.n_inputs + 31) // 32

serialize_weights(layers)

Serialize network weights to binary blob.

Parameters

layers : list Each entry is (n_inputs, n_outputs, threshold, weight_rows). weight_rows is list[list[int]] (n_outputs × words_per_row u32 values).

Returns

bytes Complete weight blob with headers.

Source code in src/sc_neurocore/edge/weights.py
Python
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def serialize_weights(layers: list[tuple[int, int, int, list[list[int]]]]) -> bytes:
    """Serialize network weights to binary blob.

    Parameters
    ----------
    layers : list
        Each entry is (n_inputs, n_outputs, threshold, weight_rows).
        weight_rows is list[list[int]] (n_outputs × words_per_row u32 values).

    Returns
    -------
    bytes
        Complete weight blob with headers.
    """
    header = WeightHeader(n_layers=len(layers))
    buf = bytearray(header.to_bytes())

    for n_inputs, n_outputs, threshold, rows in layers:
        lh = LayerHeader(n_inputs=n_inputs, n_outputs=n_outputs, threshold=threshold)
        buf.extend(lh.to_bytes())
        for row in rows:
            for word in row:
                buf.extend(struct.pack("<I", word & 0xFFFF_FFFF))

    return bytes(buf)

deserialize_weights(data)

Deserialize a weight blob into layer headers + weight matrices.

Returns

list[tuple[LayerHeader, list[list[int]]]] Each entry is (header, weight_rows).

Source code in src/sc_neurocore/edge/weights.py
Python
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def deserialize_weights(data: bytes) -> list[tuple[LayerHeader, list[list[int]]]]:
    """Deserialize a weight blob into layer headers + weight matrices.

    Returns
    -------
    list[tuple[LayerHeader, list[list[int]]]]
        Each entry is (header, weight_rows).
    """
    header = WeightHeader.from_bytes(data[:16])
    if not header.validate():
        raise ValueError(f"Invalid weight blob: magic=0x{header.magic:08X}")

    offset = 16
    layers = []
    for _ in range(header.n_layers):
        lh = LayerHeader.from_bytes(data[offset : offset + 16])
        offset += 16
        rows = []
        wpr = lh.words_per_row
        for _ in range(lh.n_outputs):
            row = []
            for _ in range(wpr):
                (word,) = struct.unpack("<I", data[offset : offset + 4])
                row.append(word)
                offset += 4
            rows.append(row)
        layers.append((lh, rows))

    return layers

sc_neurocore.edge.power_estimator

Power consumption and memory footprint estimation for RISC-V MCU targets.

Enables pre-deployment validation that a network fits in target board RAM/flash and provides µW power estimates at given clock frequencies.

Board

Bases: Enum

Supported RISC-V MCU targets.

Source code in src/sc_neurocore/edge/power_estimator.py
Python
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class Board(Enum):
    """Supported RISC-V MCU targets."""

    ESP32_C3 = ("ESP32-C3", 400, 4096, 15_000, 5)
    ESP32_C6 = ("ESP32-C6", 512, 4096, 18_000, 7)
    ESP32_H2 = ("ESP32-H2", 320, 4096, 12_000, 3)
    GD32VF103 = ("GD32VF103", 32, 128, 8_000, 10)
    CH32V307 = ("CH32V307", 64, 256, 10_000, 8)
    K210 = ("K210", 8192, 16384, 300_000, 50)
    GENERIC = ("Generic", 64, 256, 10_000, 10)

    def __init__(self, label: str, ram_kb: int, flash_kb: int, active_uw: int, sleep_uw: int):
        self.label = label
        self.ram_kb = ram_kb
        self.flash_kb = flash_kb
        self._active_uw_ref = active_uw
        self._sleep_uw = sleep_uw

PowerProfile dataclass

Estimated power profile for a target board at a given clock.

Source code in src/sc_neurocore/edge/power_estimator.py
Python
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@dataclass
class PowerProfile:
    """Estimated power profile for a target board at a given clock."""

    board: Board
    clock_mhz: int
    active_uw: int
    sleep_uw: int

    @classmethod
    def for_board(cls, board: Board, clock_mhz: int = 160) -> PowerProfile:
        scaled = board._active_uw_ref * clock_mhz // 160
        return cls(board=board, clock_mhz=clock_mhz, active_uw=scaled, sleep_uw=board._sleep_uw)

    def duty_cycled_uw(self, duty: float) -> int:
        """Estimate µW for a given duty cycle (0.0=sleep, 1.0=active)."""
        return int(self.active_uw * duty + self.sleep_uw * (1.0 - duty))

duty_cycled_uw(duty)

Estimate µW for a given duty cycle (0.0=sleep, 1.0=active).

Source code in src/sc_neurocore/edge/power_estimator.py
Python
54
55
56
def duty_cycled_uw(self, duty: float) -> int:
    """Estimate µW for a given duty cycle (0.0=sleep, 1.0=active)."""
    return int(self.active_uw * duty + self.sleep_uw * (1.0 - duty))

MemoryFootprint dataclass

Memory footprint estimate for a tinySC network.

Source code in src/sc_neurocore/edge/power_estimator.py
Python
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
@dataclass
class MemoryFootprint:
    """Memory footprint estimate for a tinySC network."""

    stack_bytes: int
    static_bytes: int
    total_bytes: int
    fits_in_ram: bool
    fits_in_flash: bool

    @classmethod
    def estimate(
        cls, num_layers: int, neurons_per_layer: int, bs_words: int, board: Board
    ) -> MemoryFootprint:
        """Estimate memory for a network configuration.

        Parameters
        ----------
        num_layers : int
            Number of layers.
        neurons_per_layer : int
            Max neurons in any layer.
        bs_words : int
            Bitstream words per neuron.
        board : Board
            Target board.
        """
        neuron_size = 12
        layer_size = neuron_size * neurons_per_layer + 32
        net_size = layer_size * num_layers + 16
        bs_stack = bs_words * 4

        stack = net_size + bs_stack + 256
        static_code = 8192

        total = stack + static_code
        ram_bytes = board.ram_kb * 1024
        flash_bytes = board.flash_kb * 1024

        return cls(
            stack_bytes=stack,
            static_bytes=static_code,
            total_bytes=total,
            fits_in_ram=stack <= ram_bytes,
            fits_in_flash=static_code <= flash_bytes,
        )

    @staticmethod
    def max_neurons(board: Board) -> int:
        """Maximum neurons that fit in a board's RAM (single layer)."""
        ram = board.ram_kb * 1024
        overhead = 512
        if ram <= overhead:
            return 0
        return (ram - overhead) // 12

estimate(num_layers, neurons_per_layer, bs_words, board) classmethod

Estimate memory for a network configuration.

Parameters

num_layers : int Number of layers. neurons_per_layer : int Max neurons in any layer. bs_words : int Bitstream words per neuron. board : Board Target board.

Source code in src/sc_neurocore/edge/power_estimator.py
Python
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
@classmethod
def estimate(
    cls, num_layers: int, neurons_per_layer: int, bs_words: int, board: Board
) -> MemoryFootprint:
    """Estimate memory for a network configuration.

    Parameters
    ----------
    num_layers : int
        Number of layers.
    neurons_per_layer : int
        Max neurons in any layer.
    bs_words : int
        Bitstream words per neuron.
    board : Board
        Target board.
    """
    neuron_size = 12
    layer_size = neuron_size * neurons_per_layer + 32
    net_size = layer_size * num_layers + 16
    bs_stack = bs_words * 4

    stack = net_size + bs_stack + 256
    static_code = 8192

    total = stack + static_code
    ram_bytes = board.ram_kb * 1024
    flash_bytes = board.flash_kb * 1024

    return cls(
        stack_bytes=stack,
        static_bytes=static_code,
        total_bytes=total,
        fits_in_ram=stack <= ram_bytes,
        fits_in_flash=static_code <= flash_bytes,
    )

max_neurons(board) staticmethod

Maximum neurons that fit in a board's RAM (single layer).

Source code in src/sc_neurocore/edge/power_estimator.py
Python
106
107
108
109
110
111
112
113
@staticmethod
def max_neurons(board: Board) -> int:
    """Maximum neurons that fit in a board's RAM (single layer)."""
    ram = board.ram_kb * 1024
    overhead = 512
    if ram <= overhead:
        return 0
    return (ram - overhead) // 12