Edge — bare-metal SC runtime + AER mesh¶
Pure-Python port of the tinysc_riscv bare-metal crate: a complete
SC inference runtime that runs on a RISC-V MCU without FPU, plus
the AER UDP mesh router used for multi-FPGA deployments. The Python
and Rust sides are bit-compatible — a weight blob produced in Python
deserialises byte-for-byte on the MCU, and an LFSR bitstream produced
on either side matches the other's output word-for-word.
from sc_neurocore.edge import (
# Packed bitstream primitives
popcount32, popcount_slice, sc_and, sc_or, sc_xor, sc_sub, sc_mux,
and_packed, mux_packed, probability, scc,
# Pseudo-random / low-discrepancy encoders
Lfsr16, SobolGenerator,
# SC neurons
LifNeuron, IzhikevichNeuron,
# Network runner
SCLayer, SCNetwork,
# Telemetry + serialisation + board profiles
TelemetryRing, LayerTelemetry, DeviceTelemetry,
WeightHeader, LayerHeader, WEIGHT_MAGIC,
serialize_weights, deserialize_weights,
PowerProfile, Board,
PowerThermalConfig, build_power_thermal_model_from_vivado_reports,
WebDeploymentConfig, build_web_deployment,
)
from sc_neurocore.edge.aer_router import AERRoutingDaemon
Browser Deployment Scaffold¶
sc_neurocore.edge.web_deploy emits a deterministic static web bundle for
.nir, .pt, .pth, and JSON model artefacts. It is the first slice of the
WASM/WebGPU deployment path: generation does not require a browser, WebGPU
driver, Node.js, or a native WASM toolchain, and the emitted manifest.json
records the runtime contract honestly.
from sc_neurocore.edge import WebDeploymentConfig, build_web_deployment
manifest = build_web_deployment(
"model.nir",
"build/web",
WebDeploymentConfig(dt=1.0, bitstream_length=256),
)
print(manifest.artefacts["html"]) # index.html
Generated layout:
build/web/
index.html
manifest.json
model/model.nir
runtime/sc_neurocore_web.js
runtime/sc_neurocore_webgpu.wgsl
The browser runtime loads the manifest, checks WebGPU availability, and exposes
the SC probability contract used by later WASM kernels. The WGSL kernel clamps
probabilities into [0, 1]; it is intentionally small so it can be used as a
capability and packaging test before computational kernels are added.
1. Mathematical formalism¶
1.1 Packed bitstream word¶
A bitstream of length $L$ is stored as $\lceil L/32 \rceil$ unsigned 32-bit words. The popcount of one word is Wilkes–Wheeler–Gill:
$$ \mathrm{popcount32}(x) = \sum_{b=0}^{31} \bigl((x \gg b) \wedge 1\bigr), $$
computed in six SWAR steps (shift-and-mask) — the same algorithm as
core_engine::bitstream::popcount32 in the Rust crate.
1.2 SC arithmetic via bitwise ops¶
Let $A,B \in {0,1}^{L}$ be independent unipolar streams with bit probabilities $p_{A},\,p_{B}$. The SC arithmetic operators are:
$$ \begin{aligned} \text{multiply} \quad & A \wedge B, & p_{A\wedge B} &= p_{A}\,p_{B} \ \text{saturating add} \quad & A \vee B, & p_{A\vee B} &= p_{A} + p_{B} - p_{A}\,p_{B} \ \text{abs.\ difference} \quad & A \oplus B, & p_{A\oplus B} &= p_{A}(1-p_{B}) + p_{B}(1-p_{A}) \ \text{sat.\ subtract} \quad & A \wedge \neg B, & p_{A \setminus B} &= p_{A}(1 - p_{B}) \ \text{scaled add (MUX)} \quad & (S \wedge A) \vee (\neg S \wedge B), & p_{\text{mux}} &= p_{S}\,p_{A} + (1-p_{S})\,p_{B} \end{aligned} $$
These identities are the foundation of stochastic computing (Gaines, 1967) and make all SC layers computable with only AND / OR / NOT / MUX cells.
1.3 Alaghi–Hayes SCC¶
$$ \mathrm{SCC}(A,B) = \begin{cases} \dfrac{p_{A \wedge B} - p_{A}p_{B}}{\min(p_{A},\,p_{B}) - p_{A}p_{B}}, & p_{A \wedge B} \geq p_{A}p_{B} \[4pt] \dfrac{p_{A \wedge B} - p_{A}p_{B}}{p_{A}p_{B} - \max(0,\,p_{A}+p_{B}-1)}, & \text{otherwise} \end{cases} $$
:func:scc implements the case-split exactly; output is bounded in
$[-1,\,+1]$.
1.4 Galois LFSR-16 encoder¶
:class:Lfsr16 uses the polynomial
$x^{16} + x^{14} + x^{13} + x^{11} + 1$ (Galois form, taps = 0xD008)
with state $r_{t} \in {1,\,\ldots,\,65535}$:
$$ b_{t} = r_{t} \wedge {2^{0}!+!2^{2}!+!2^{3}!+!2^{5}}, \qquad r_{t+1} = (r_{t} \gg 1) \,\vee\, (b_{t} \ll 15). $$
Period is 65 535 (maximal for 16 bits). The probability encoder compares the 16-bit state to a threshold $\theta \in [0,\,65535]$:
$$ \mathrm{bit}{t} = \mathbf{1}\bigl[r < \theta\bigr], \qquad p = \theta/65535. $$
1.5 Sobol low-discrepancy sequence¶
:class:SobolGenerator implements 1-D Sobol with Joe–Kuo direction
numbers (dimension 1: $V_{k} = 2^{15-k}$ for $k = 1..16$). Using
Gray-code indexing, each step costs one XOR:
$$ x_{n+1} = x_{n} \oplus V_{c(n)}, \qquad c(n) = \text{position of lowest 1-bit in } n. $$
Bits generated with Sobol thresholding have discrepancy $O(\log^{d} N / N)$ vs LFSR's $O(1/\sqrt{N})$, so SC pipelines fed from Sobol streams hit target precision with $L$ up to 4× shorter.
1.6 LIF with popcount accumulator¶
:class:LifNeuron tracks membrane $V$ as a running popcount with
right-shift leak:
$$ V_{t+1} = V_{t} + \mathrm{popcount}(I_{t}) - (V_{t} \gg s), \qquad V_{t+1} \geq \Theta \;\Rightarrow\; \text{spike},\;\; V_{t+1} \leftarrow 0, $$
where $s$ is the leak-shift (typical $s=3$ gives $\tau_{\text{leak}} = 1/(1 - 2^{-3}) \approx 8$ ticks). No multiplier, no FPU needed.
1.7 Izhikevich in Q16.16¶
:class:IzhikevichNeuron runs Izhikevich's 2003 two-variable model
entirely in 32-bit integer Q16.16:
$$ \dot{V} = 0.04 V^{2} + 5 V + 140 - U + I, \qquad \dot{U} = a (b V - U), $$
with reset $(V \geq 30 \Rightarrow V \leftarrow c,\; U \leftarrow U + d)$. Q16.16 gives $\Delta = 2^{-16} \approx 1.5 \cdot 10^{-5}$ resolution on $V$; the four presets (regular spiking, fast spiking, chattering, intrinsic burst) match the published $(a,b,c,d)$ tuples exactly.
1.8 Weight-blob wire format¶
offset | size | field
-------+------+----------------------
0 | 4 | magic = 0x5343_574C ("SCWL")
4 | 4 | version
8 | 4 | n_layers
12 | 4 | flags
16 | 4 | layer[0].n_inputs
20 | 4 | layer[0].n_outputs
24 | 4 | layer[0].threshold
28 | 4 | reserved
32 | * | layer[0].weights (n_outputs × words_per_row × u32)
...
All multi-byte fields are little-endian. words_per_row = ⌈n_inputs/32⌉.
2. Theory (why these particular mechanics)¶
2.1 Why bit-exact with Rust¶
The Rust crate (tinysc_riscv) is what actually runs on the MCU; the
Python module is the design-space exploration surface. Any divergence
between the two introduces silent drift — a network simulated in
Python that no longer matches its MCU deployment. We therefore keep
the same integer arithmetic, the same LFSR polynomial, the same Q16.16
encoding, the same byte layout. tests/test_edge/test_parity.py
asserts byte-identity on 1024-bit LFSR streams, Izhikevich spike
trains, and weight blobs.
2.2 Why integer arithmetic everywhere¶
The target board family (Table in §4) includes GD32VF103 (32 kB RAM, no FPU) and ESP32-C3 (400 kB RAM, no FPU). Running an SC network there means:
- No
floatanywhere — membrane, thresholds, weights, plasticity rates are all integers. - No heap allocation — weight blobs are loaded zero-copy from flash.
- Popcount is the only inner kernel that must be fast; the RISC-V
cpopextension is a single-cycle instruction, and the WWG fallback is ~6 cycles on cores without it.
2.3 Why both LFSR and Sobol¶
LFSR is the default because it is the smallest stream generator that
exists (16 FFs + 4 XOR gates) and perfectly matches the Rust
core_engine::bitstream::Lfsr16. Sobol is the precision option: for
a target bit probability $p$, Sobol converges to $\hat{p} = p \pm
O(\log N / N)$ vs LFSR's $\pm O(1/\sqrt{N})$. On a 1024-bit stream at
$p=0.5$, LFSR's 1-σ error is ≈0.0156; Sobol's is ≈0.002. Sobol costs
more per sample (one XOR + one trailing-zero count) but lets a
bandwidth-constrained fabric cut $L$ by 4× for the same MSE.
2.4 Why SCLayer does OR-combine, not concatenation¶
The most common pitfall when cascading SC layers is to simply
concatenate the upstream spike-stream words. This destroys the
probability interpretation (the combined stream becomes biased by
word order). :meth:SCNetwork._flatten_bitstreams instead uses the
SC saturating-add identity $A \vee B$, which preserves the
probability semantics up to a correction term $p_{A}p_{B}$ that is
small when individual inputs have low density.
2.5 Why cascading re-encodes through LFSR¶
Between layers, the boolean spike vector from layer $L_{k}$ is
re-encoded to a new bitstream by :class:Lfsr16. This deliberate
re-randomisation prevents correlation buildup: a spike from layer
$k$ that fires twice in a row would otherwise produce a perfectly
correlated stream that cannot be multiplied downstream
(SCC(A, A) = 1 gives A ∧ A = A, not p²). Re-encoding restores
independence at the cost of one LFSR-cycle per spike — free on any
core with a hardware XOR tree.
2.6 Why the AER router is Go, not Python¶
UDP mesh routing at AER rates (~Mpkts/s peak) needs concurrency and
zero GC pauses. Go's goroutine model handles the fan-out per channel
without the GIL bottleneck that Python would have. The Python
:class:AERRoutingDaemon is a supervisor: go build on demand,
spawn, SIGTERM on teardown.
3. Position in the pipeline¶
┌──────────────────────────────────────────────┐
│ Python design │
│ (train / evolve a network, export weights) │
└──────────────────┬───────────────────────────┘
│
serialize_weights()
│
▼
┌─────────────┐
│ .bin blob │
│ (SCWL magic)│
└──────┬──────┘
│ zero-copy load on boot
▼
┌────────────────────────────────────┐
│ RISC-V MCU target │
│ ┌──────────────────────────────┐ │
│ │ sc_neurocore::edge (Rust) │ │
│ │ Lfsr16 → SCLayer.forward │ │
│ │ → LifNeuron / Izhikevich │ │
│ │ → TelemetryRing │ │
│ └──────────────────────────────┘ │
└─────────────────┬──────────────────┘
│ AER events (UDP)
▼
┌─────────────────────┐
│ AERRoutingDaemon │ (Go)
│ UDP mesh │
└─────────┬───────────┘
│
▼
Downstream FPGA tiles
4. Supported boards¶
| Board | RAM | Flash | Active µW @160 MHz | Sleep µW | Comment |
|---|---|---|---|---|---|
| ESP32-C3 | 400 kB | 4 MB | 15 000 | 5 | WROOM-02-class |
| ESP32-C6 | 512 kB | 4 MB | 18 000 | 7 | adds 802.15.4 |
| ESP32-H2 | 320 kB | 4 MB | 12 000 | 3 | low-power variant |
| GD32VF103 | 32 kB | 128 kB | 8 000 | 10 | smallest, cheapest |
| CH32V307 | 64 kB | 256 kB | 10 000 | 8 | high I/O count |
| K210 | 8 MB | 16 MB | 300 000 | 50 | dual-core + KPU |
| Generic | 64 kB | 256 kB | 10 000 | 10 | conservative fallback |
:meth:PowerProfile.for_board linearly scales active_uw with clock
frequency (reference 160 MHz). :meth:MemoryFootprint.estimate checks
that a chosen network fits before deployment.
5. Features¶
- 11 bitstream primitives (popcount + 5 SC ops + 3 packed variants + probability + SCC).
- Bit-compatible LFSR-16 and Gray-code Sobol encoders.
- Two SC neuron models (LIF, Izhikevich with 4 presets).
- Multi-layer SCNetwork runner with per-layer cascading re-encode.
- Zero-copy weight blob (
SCWLmagic, 16-byte headers, little-endian). - Runtime telemetry ring buffer (per-layer + per-device).
- 7 pre-profiled RISC-V MCU targets + memory-footprint estimator.
- Cargo / memory.x config generators (:func:
generate_cargo_config, :func:generate_memory_x) for no_std RISC-V builds. - Go AER UDP mesh router with Python lifecycle supervisor.
6. Usage¶
6.1 Run a 2-layer SC network¶
from sc_neurocore.edge import SCNetwork, SCLayer
net = SCNetwork(bit_length=1024)
net.add_layer(SCLayer(n_inputs=32, n_outputs=16))
net.add_layer(SCLayer(n_inputs=16, n_outputs=8))
spikes = net.run([0.5] * 32) # bool[8]
6.2 Export + reload weights¶
from sc_neurocore.edge import serialize_weights, deserialize_weights
blob = serialize_weights(net.export_weights())
open("weights.bin", "wb").write(blob)
# On the MCU side, Rust loads it zero-copy; in Python we round-trip:
layers = deserialize_weights(open("weights.bin", "rb").read())
net2 = SCNetwork.from_weights(layers, bit_length=1024)
Expected blob size for a 32→16→8 network:
header (16 B) + 2 × layer header (16 B) + weight words
($16 \cdot 1 \cdot 4 + 8 \cdot 1 \cdot 4 = 96$ B) = 144 B.
6.3 Power budget¶
from sc_neurocore.edge import Board, PowerProfile, MemoryFootprint
prof = PowerProfile.for_board(Board.ESP32_C3, clock_mhz=80)
print(prof.duty_cycled_uw(duty=0.2)) # µW at 20 % active
fp = MemoryFootprint.estimate(
num_layers=2, neurons_per_layer=64, bs_words=32, board=Board.GD32VF103
)
print(fp.fits_in_ram, fp.stack_bytes)
6.4 FPGA report-derived power/thermal JSON¶
Deployment bundles can emit a pre-silicon model from architecture settings, or a report-derived model once Vivado has produced routed reports. The report-derived path records the Vivado headline power, static/dynamic split, effective TJA, junction temperature, and implementation resource counts while preserving the SC workload metadata used by the estimator.
from sc_neurocore.edge import (
PowerThermalConfig,
write_power_thermal_model_from_vivado_reports,
)
write_power_thermal_model_from_vivado_reports(
"sc_shd_pynq/sc_shd_pynq.runs/impl_1",
"sc_shd_pynq/deployable_artifacts",
PowerThermalConfig(
target="zynq",
layer_sizes=((700, 128), (128, 128), (128, 20)),
bitstream_length=256,
clock_mhz=100.0,
),
)
The emitted JSON uses source_mode = "vivado_report_derived". It is still not
a substitute for physical PYNQ board measurement; it is the reproducible bridge
between routed Vivado reports and the deployment artefact directory.
6.5 Launch the AER router¶
from sc_neurocore.edge.aer_router import AERRoutingDaemon
router = AERRoutingDaemon(port=9000)
router.start(build=True)
# ... experiment drives UDP events to localhost:9000 ...
router.stop()
7. Verified benchmarks¶
Measured on Ubuntu 24.04 / CPython 3.12.3 / Intel i5-11600K @ 3.90 GHz,
single-thread, 2026-04-20. Committed script:
benchmarks/bench_edge.py. Raw JSON at
benchmarks/results/bench_edge.json.
| Operation | Throughput | Latency |
|---|---|---|
popcount32 (pure-Py) |
3.30 M ops/s | 303.0 ns |
popcount_slice (1024 words) |
3 545 ops/s | 282.1 µs |
Lfsr16.encode (1024-bit) |
3 680 ops/s | 271.8 µs |
SobolGenerator.encode (1024-bit) |
916 ops/s | 1 092.3 µs |
LifNeuron.tick (32-word input) |
99 909 ops/s | 10.0 µs |
IzhikevichNeuron.tick |
2.21 M ops/s | 453.2 ns |
SCNetwork.run (32→16→8 @ 1024 bits) |
65 runs/s | 15.35 ms |
serialize_weights (2-layer, 144 B blob) |
199 047 ops/s | 5.02 µs |
deserialize_weights (2-layer) |
126 270 ops/s | 7.92 µs |
scc (32 words = 1024 bits) |
36 076 ops/s | 27.7 µs |
Figures above are time.perf_counter deltas from
benchmarks/bench_edge.py.
Interpretation.
popcount32at 319 ns is slow relative to the Rustcpopinstruction (~1 ns at 1 GHz) but the Python version is only used for R&D — on the MCU the Rust path is what runs.Lfsr16.encodevsSobolGenerator.encodeon a 1024-bit stream: LFSR is ~4× faster per bitstream because the Sobol step needs(idx & -idx).bit_length()(trailing-zero count) per sample. Sobol pays that cost for better precision; the rule of thumb is to use Sobol only when $L \leq 256$ and precision is critical.SCNetwork.runat 14.34 ms for 32→16→8 is dominated by per-input LFSR re-encoding; a NumPy-vectorised version is available insc_neurocore.v3.enginebut is not bit-compatible with the MCU target and so is excluded from this module.IzhikevichNeuron.tickis 21× faster thanLifNeuron.tickbecause the LIF tick runs a full popcount over 32 words per step, whereas Izhikevich just does integer MACs on two 32-bit state variables.
8. Citations¶
- Gaines B.R. (1967). Stochastic computing systems. In Advances in Information Systems Science, vol. 2, Plenum, 37–172.
- Alaghi A., Hayes J.P. (2013). Exploiting correlation in stochastic circuit design. ICCD-2013, 39–46. (SCC definition.)
- Wilkes M.V., Wheeler D.J., Gill S. (1951). The Preparation of Programs for an Electronic Digital Computer. Addison-Wesley. (WWG popcount.)
- Izhikevich E.M. (2003). Simple model of spiking neurons. IEEE TNN 14(6):1569–1572. (Two-variable Q16.16 model.)
- Joe S., Kuo F.Y. (2008). Constructing Sobol' sequences with better two-dimensional projections. SIAM J. Sci. Comput. 30:2635–2654.
- RISC-V Foundation (2021). RISC-V Bit-Manipulation Extension v1.0.
(
cpop/cpopwpopcount instructions.) - Šotek M. (2026). SC-NeuroCore: bare-metal SC runtime port. Internal report, ANULUM.
9. Known limitations¶
- Python runner is not the hot path.
SCNetwork.runin Python costs ~14 ms per forward; the MCU Rust runner completes the same network in <200 µs on a 160 MHz ESP32-C3. Use the Python path for verification and weight-blob authoring, not for inference timing. - Single-dimension Sobol. Only dimension 1 (V_k = 2^{15-k}) is
wired; decorrelating N > 1 input streams still needs a phase-shifted
LFSR bank (see
sc_neurocore.v3.engine). - No 64-bit packing on the Python side. LFSR packs to u32; SobolGenerator packs to u64 — the two encoders are therefore not drop-in replacements in a u32-word pipeline. Pick one per layer.
SCLayer.forwardhas no bias term. Dense layers are weight-only; the bias must be folded into the threshold at quantisation time.- Integer Izhikevich is a best-effort Q16.16 discretisation. The
quadratic term
v*v >> 14rounds differently from a floating-point reference; results match the published spike trains qualitatively (regular / fast / chattering / burst) but not bit-for-bit against a double-precision Izhikevich simulator. - AER router is IPv4 UDP only. IPv6 and unix-domain sockets are
not supported; on a Linux host with
net.ipv6.bindv6only=1the router will fail to bind without explicit IPv4 address. - No RTT / loss bookkeeping in the Python supervisor. Dropped AER
packets are visible on the Go side (in
aer_routerlogs) but the Python daemon does not surface the metric.
10. Rust parity — what the MCU runs¶
The Rust crate tinysc_riscv under crates/tinysc_riscv/ is the
physical MCU counterpart to every module on this page. The parity rule
is byte-exact for the four externally visible surfaces:
| Surface | Python type / function | Rust counterpart |
|---|---|---|
| Packed bitstream | popcount32, sc_and, … |
bitstream::popcount32, bitstream::sc_and |
| LFSR stream | Lfsr16 |
bitstream::Lfsr16 |
| Q16.16 Izhikevich | IzhikevichNeuron + 4 presets |
neuron::IzhikevichNeuron + 4 presets |
| Weight blob | serialize_weights / deserialize_weights |
weights::load_zero_copy |
The parity tests live in tests/test_edge/test_parity.py and assert
word-for-word equality on 1024-bit LFSR streams seeded at 0xACE1,
and byte-for-byte equality on a 2-layer SCWL blob. Any change to
the Python representation that would break parity must include a
matching Rust change in the same PR.
The Python side additionally provides three surfaces with no Rust
counterpart because they are R&D-only: :class:SobolGenerator
(precision experiments), :class:PowerProfile / :class:Board
(pre-deployment footprint estimation), and :func:generate_cargo_config
/ :func:generate_memory_x (cross-compilation aids). These never ship
to the MCU.
11. Reproducibility¶
Every number in §7 is reproducible from a clean checkout by running
python benchmarks/bench_edge.py
which writes benchmarks/results/bench_edge.json alongside the stdout
table. Randomness is deterministic: Lfsr16(0xACE1) and
SobolGenerator(0) have no hidden entropy, and SCNetwork.run
initialises a fresh LFSR per call.
Variance between runs on the same host is dominated by scheduler
jitter and CPU-cache state; on the CEO workstation the inner hot
paths (popcount32, Izhikevich tick, weights serialise) stay within
±5 % of the numbers in §7 across repeated runs. SCNetwork.run can
drift up to ±15 % because its 15 ms per-forward budget is dominated by
Python-level allocation inside _spikes_to_bitstreams; pin the
benchmark to a single core with taskset -c 0 python
benchmarks/bench_edge.py for lower variance if you need to track
regressions.
12. Embedding hook — telemetry ring¶
:class:TelemetryRing is a fixed-capacity lock-protected ring buffer
that stores u32 samples (Python uses an int list internally; the
Rust counterpart uses [u32; N] on the MCU). The ring is
threading.Lock-guarded on the Python side for multi-thread writers;
the Rust side uses a single-producer assumption and needs no lock.
API:
- :meth:
TelemetryRing.push(value)— overwrite-on-full append. - :meth:
TelemetryRing.mean()— arithmetic mean of the current window. - :meth:
TelemetryRing.last()— most recent value. - :attr:
TelemetryRing.count— number of live entries (<= capacity). - :attr:
TelemetryRing.capacity— construction-time fixed capacity (default 256).
:class:LayerTelemetry wraps two rings — spike_rate_ring and
utilization_ring (both capacity 64 by default) — and records per-
tick activity via :meth:LayerTelemetry.record_tick(n_spikes, n_neurons).
:class:DeviceTelemetry aggregates a dict of :class:LayerTelemetry
and exposes :meth:DeviceTelemetry.summary returning a JSON-ready
dict of per-layer (spike_count, tick_count, mean_spike_rate,
mean_utilization) plus device-level (total_ticks, total_spikes,
error_count).
On the MCU side the rings are the same shape and the same field
semantics; the Rust telemetry::DeviceTelemetry::summary() returns the
same dict (serialised as JSON to the UART/WebSocket transport), so the
HIL debugger's frame schema is invariant across target. A shared-
memory fast path between Rust and Go is not yet wired — today the
data travels as JSON over UART or WebSocket.
Reference¶
- Sources (package root):
src/sc_neurocore/edge/__init__.py(74 LOC, re-export surface)src/sc_neurocore/edge/bitstream.py(109 LOC)src/sc_neurocore/edge/lfsr.py(66 LOC)src/sc_neurocore/edge/sobol.py(91 LOC)src/sc_neurocore/edge/neuron.py(107 LOC, LIF + Izhikevich)src/sc_neurocore/edge/sc_network.py(154 LOC)src/sc_neurocore/edge/weights.py(129 LOC, SCWL format)src/sc_neurocore/edge/telemetry.py(133 LOC)src/sc_neurocore/edge/power_estimator.py(113 LOC)src/sc_neurocore/edge/deploy.py(63 LOC)src/sc_neurocore/edge/aer_router.py(48 LOC, Go supervisor)- Go daemon:
src/sc_neurocore/accel/go/services/aer_router/main.go+main_test.go. - Benchmark:
benchmarks/bench_edge.py. - Parity tests vs Rust
tinysc_riscv:tests/test_edge/*.py.
sc_neurocore.edge.aer_router
¶
AERRoutingDaemon
¶
Orchestrates the Go-based AER UDP mesh multi-FPGA router pipeline dynamically.
Source code in src/sc_neurocore/edge/aer_router.py
| Python | |
|---|---|
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
stop()
¶
Tears down the active background UDP topology safely.
Source code in src/sc_neurocore/edge/aer_router.py
| Python | |
|---|---|
42 43 44 45 46 47 48 | |
sc_neurocore.edge.bitstream
¶
Packed u32-word bitstream operations for SC arithmetic.
All operations work on lists of unsigned 32-bit integers, mirroring the bare-metal Rust implementation for RISC-V targets. Provides popcount, SC AND/OR/XOR/MUX/SUB, SCC computation, and probability estimation.
popcount32(word)
¶
Count set bits in a u32 word (Wilkes-Wheeler-Gill).
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
22 23 24 25 26 27 28 29 30 | |
popcount_slice(words)
¶
Popcount over a packed u32 word slice.
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
33 34 35 36 37 38 | |
sc_and(a, b)
¶
SC multiply (bitwise AND).
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
41 42 43 | |
sc_or(a, b)
¶
SC saturating addition (bitwise OR).
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
46 47 48 | |
sc_xor(a, b)
¶
SC absolute difference / HDC bind (bitwise XOR).
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
51 52 53 | |
sc_sub(a, b)
¶
SC saturating subtraction: a AND NOT b.
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
56 57 58 | |
sc_mux(a, b, sel)
¶
SC scaled addition (2:1 MUX): (a AND sel) OR (b AND NOT sel).
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
61 62 63 | |
and_packed(a, b)
¶
SC AND over two packed word slices.
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
66 67 68 69 | |
mux_packed(a, b, sel)
¶
SC MUX over two packed word slices with a select bitstream.
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
72 73 74 75 | |
probability(words, bit_length)
¶
Estimated probability from a packed bitstream.
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
78 79 80 81 82 | |
scc(a, b, bit_length)
¶
SCC between two packed u32 bitstreams (Alaghi & Hayes, 2013).
Returns a correlation coefficient in [-1, 1].
Source code in src/sc_neurocore/edge/bitstream.py
| Python | |
|---|---|
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
sc_neurocore.edge.lfsr
¶
Deterministic LFSR-16 encoder bit-compatible with core_engine::Lfsr16.
Polynomial: x^16 + x^14 + x^13 + x^11 + 1 (maximal length = 65535). Generates packed u32-word bitstreams from probability thresholds.
Lfsr16
¶
16-bit Galois LFSR bitstream encoder.
Bit-compatible with the Rust core_engine::bitstream::Lfsr16. Uses u32-packed output for MCU word alignment.
Source code in src/sc_neurocore/edge/lfsr.py
| Python | |
|---|---|
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
step()
¶
Advance LFSR by one clock, return new state.
Source code in src/sc_neurocore/edge/lfsr.py
| Python | |
|---|---|
34 35 36 37 38 | |
encode(threshold, bit_length)
¶
Encode probability (threshold/65535) into packed u32 words.
Parameters¶
threshold : int Comparison threshold [0, 65535]. Higher = more 1-bits. bit_length : int Number of bits in the output bitstream.
Returns¶
list[int] Packed u32 words representing the bitstream.
Source code in src/sc_neurocore/edge/lfsr.py
| Python | |
|---|---|
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
encode_float(p, bit_length)
¶
Encode a probability [0.0, 1.0] into a packed bitstream.
Source code in src/sc_neurocore/edge/lfsr.py
| Python | |
|---|---|
63 64 65 66 | |
sc_neurocore.edge.sobol
¶
Sobol low-discrepancy sequence generator for SC bitstream decorrelation.
Provides better uniformity than LFSR-16 at the cost of slightly more compute per step. Uses Gray-code acceleration for O(1) per-sample generation (no matrix multiply needed).
SobolGenerator
¶
1D Sobol sequence generator with 16-bit resolution.
Uses Joe-Kuo direction numbers (dimension 1) and Gray-code indexing so only one XOR per step.
Source code in src/sc_neurocore/edge/sobol.py
| Python | |
|---|---|
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
step()
¶
Advance by one step, return the next Sobol value in [0, 65535].
Source code in src/sc_neurocore/edge/sobol.py
| Python | |
|---|---|
54 55 56 57 58 59 60 61 62 63 | |
encode(threshold, length)
¶
Encode a probability into packed u64 words using Sobol sequence.
Parameters¶
threshold : int Value in [0, 65535]. Each Sobol sample < threshold becomes a 1-bit. length : int Number of bits in the bitstream.
Returns¶
np.ndarray Packed u64 bitstream array.
Source code in src/sc_neurocore/edge/sobol.py
| Python | |
|---|---|
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
reset(seed=0)
¶
Reset to initial state.
Source code in src/sc_neurocore/edge/sobol.py
| Python | |
|---|---|
88 89 90 91 | |
sc_neurocore.edge.neuron
¶
LIF and Izhikevich spiking neurons operating in the SC domain.
Membrane potential is tracked as a popcount accumulator (integer, no FPU). This mirrors the bare-metal implementation for RISC-V targets where floating-point is unavailable or expensive.
LifNeuron
dataclass
¶
Leaky Integrate-and-Fire neuron (SC domain, integer arithmetic).
Membrane potential = running popcount of input bitstream. Leak = right-shift per tick (exponential decay). Fires when potential exceeds threshold.
Source code in src/sc_neurocore/edge/neuron.py
| Python | |
|---|---|
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
tick(input_words)
¶
Process one timestep, return True if spike fired.
Source code in src/sc_neurocore/edge/neuron.py
| Python | |
|---|---|
37 38 39 40 41 42 43 44 45 46 | |
IzhikevichNeuron
dataclass
¶
Izhikevich neuron with integer SC-domain dynamics.
Uses fixed-point arithmetic (Q16.16) to avoid floating-point. Supports regular spiking, fast spiking, chattering, and intrinsic burst.
Source code in src/sc_neurocore/edge/neuron.py
| Python | |
|---|---|
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
tick(input_current_q16)
¶
Process one timestep. Returns True on spike.
Source code in src/sc_neurocore/edge/neuron.py
| Python | |
|---|---|
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
sc_neurocore.edge.sc_network
¶
Fixed-capacity feed-forward SC network runner.
Mirrors the bare-metal Rust implementation, providing a stack-like execution model: encode inputs → layer-by-layer SC inference → decode outputs.
SCLayer
dataclass
¶
Single dense SC layer: weights × inputs via AND + popcount threshold.
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
forward(input_words, bit_length)
¶
Run SC inference: AND each weight row with input, threshold popcount.
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
SCNetwork
dataclass
¶
Multi-layer feed-forward SC network runner.
Usage::
net = SCNetwork(bit_length=1024)
net.add_layer(SCLayer(n_inputs=32, n_outputs=16))
net.add_layer(SCLayer(n_inputs=16, n_outputs=8))
output = net.run([0.5] * 32)
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
encode_inputs(probabilities)
¶
Encode float probabilities into per-input packed bitstreams.
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
119 120 121 122 123 124 | |
run(input_probabilities)
¶
Full inference: encode → cascaded layer inference → spike output.
Each layer's spike output is re-encoded as bitstreams and fed to the next layer. This is the correct SC cascade semantics.
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
export_weights()
¶
Export all layer weights in serialization-ready format.
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
172 173 174 175 176 177 | |
from_weights(layers_data, bit_length=1024, lfsr_seed=44257)
classmethod
¶
Construct network from deserialized weight data.
Source code in src/sc_neurocore/edge/sc_network.py
| Python | |
|---|---|
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
sc_neurocore.edge.weights
¶
Zero-copy weight loading for SC networks.
Binary format for pre-trained SC network weights that can be loaded from flash/disk without heap allocation. Compatible with the Rust bare-metal implementation.
Wire format (little-endian):
Header fields: 4B magic 0x5343574C, 4B version,
4B n_layers, 4B flags.
Per-layer fields: 4B n_inputs, 4B n_outputs,
4B threshold, 4B reserved, followed by
n_outputs × n_words × 4B weight words.
WeightHeader
dataclass
¶
Weight blob header (16 bytes).
Source code in src/sc_neurocore/edge/weights.py
| Python | |
|---|---|
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
LayerHeader
dataclass
¶
Per-layer header (16 bytes).
Source code in src/sc_neurocore/edge/weights.py
| Python | |
|---|---|
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
serialize_weights(layers)
¶
Serialize network weights to binary blob.
Parameters¶
layers : list Each entry is (n_inputs, n_outputs, threshold, weight_rows). weight_rows is list[list[int]] (n_outputs × words_per_row u32 values).
Returns¶
bytes Complete weight blob with headers.
Source code in src/sc_neurocore/edge/weights.py
| Python | |
|---|---|
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
deserialize_weights(data)
¶
Deserialize a weight blob into layer headers + weight matrices.
Returns¶
list[tuple[LayerHeader, list[list[int]]]] Each entry is (header, weight_rows).
Source code in src/sc_neurocore/edge/weights.py
| Python | |
|---|---|
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
sc_neurocore.edge.power_estimator
¶
Power consumption and memory footprint estimation for RISC-V MCU targets.
Enables pre-deployment validation that a network fits in target board RAM/flash and provides µW power estimates at given clock frequencies.
Board
¶
Bases: Enum
Supported RISC-V MCU targets.
Source code in src/sc_neurocore/edge/power_estimator.py
| Python | |
|---|---|
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
PowerProfile
dataclass
¶
Estimated power profile for a target board at a given clock.
Source code in src/sc_neurocore/edge/power_estimator.py
| Python | |
|---|---|
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
duty_cycled_uw(duty)
¶
Estimate µW for a given duty cycle (0.0=sleep, 1.0=active).
Source code in src/sc_neurocore/edge/power_estimator.py
| Python | |
|---|---|
54 55 56 | |
MemoryFootprint
dataclass
¶
Memory footprint estimate for a tinySC network.
Source code in src/sc_neurocore/edge/power_estimator.py
| Python | |
|---|---|
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
estimate(num_layers, neurons_per_layer, bs_words, board)
classmethod
¶
Estimate memory for a network configuration.
Parameters¶
num_layers : int Number of layers. neurons_per_layer : int Max neurons in any layer. bs_words : int Bitstream words per neuron. board : Board Target board.
Source code in src/sc_neurocore/edge/power_estimator.py
| Python | |
|---|---|
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
max_neurons(board)
staticmethod
¶
Maximum neurons that fit in a board's RAM (single layer).
Source code in src/sc_neurocore/edge/power_estimator.py
| Python | |
|---|---|
106 107 108 109 110 111 112 113 | |