Network-Level Compilation¶

Compile multi-neuron spike networks to FPGA using BRAM auto-selection, time-multiplexed neuron arrays, and weight ROM generation. This guide covers scaling from registers (≤64 neurons) through BRAM (65–16K) to URAM (>16K on UltraScale+), with complete weight matrix encoding in Verilog, Xilinx .coe, and Intel .mif formats.

1. Mathematical Formalism¶

1.1 Storage Strategy Decision¶

The optimal storage strategy minimises resource cost $C(N, W)$ for $N$ neurons with $W$ state bits each:

$$ \text{strategy}(N, W) = \begin{cases} \text{registers} & \text{if } N \leq 64 \ \text{BRAM} & \text{if } 64 < N \leq 16{,}384 \ \text{URAM} & \text{if } N > 16{,}384 \text{ and URAM available} \end{cases} $$

1.2 BRAM Tile Estimation¶

For BRAM-based storage, the number of 18Kb or 36Kb tiles required:

$$ T_{18\text{K}} = \begin{cases} 1 & \text{if } N \cdot W \leq 18{,}432 \ 0 & \text{otherwise} \end{cases} \qquad T_{36\text{K}} = \left\lceil \frac{N \cdot W}{36{,}864} \right\rceil $$

1.3 URAM Tile Estimation¶

UltraRAM tiles are 288Kb (72 bits × 4096 depth):

$$ T_{\text{URAM}} = \left\lceil \frac{N \cdot W}{294{,}912} \right\rceil $$

1.4 Time-Multiplexed Processing¶

A single compute pipeline processes $N$ neurons sequentially, completing one network tick in $N$ clock cycles:

$$ T_{\text{tick}} = N \cdot T_{\text{clk}} = \frac{N}{f_{\text{clk}}} $$

At 200 MHz with 1024 neurons: $T_{\text{tick}} = 5.12\ \mu\text{s}$.

Maximum simulation speed:

$$ f_{\text{sim}} = \frac{f_{\text{clk}}}{N} = \frac{200 \times 10^6}{1024} = 195{,}312\ \text{ticks/s} $$

1.5 Weight ROM Addressing¶

For a fully connected layer with $N_{\text{src}}$ source and $N_{\text{dst}}$ destination neurons:

$$ \text{addr} = i_{\text{src}} \cdot N_{\text{dst}} + i_{\text{dst}} $$

Total ROM size: $N_{\text{src}} \times N_{\text{dst}} \times W_{\text{weight}}$ bits.

2. Architecture¶

2.1 Network Compilation Pipeline¶

flowchart TB
    A["NIR / ONNX Model"] --> B["from_scnetwork()"]
    B --> C["NeuronGraph"]
    C --> D["storage_recommendation()"]
    D --> E{"Strategy"}
    E -->|"≤64"| F["Register Array"]
    E -->|"65–16K"| G["BRAM Array"]
    E -->|">16K"| H["URAM Array"]
    F --> I["generate_weight_rom()"]
    G --> I
    H --> I
    I --> J["Top-Level Interconnect"]
    J --> K["Verilog Output"]

2.2 Time-Multiplexed Neuron Array¶

Text Only

┌────────────────────────────────────────────────────┐
│  sc_neuron_array                                    │
│                                                     │
│  ┌──────────┐    ┌────────────┐    ┌───────────┐  │
│  │ BRAM     │───►│ Compute    │───►│ Write-Back│  │
│  │ State    │    │ Pipeline   │    │ + Spike   │  │
│  │ [0:N-1]  │◄───│ (1 neuron/ │◄───│ Detection │  │
│  └──────────┘    │  cycle)    │    └───────────┘  │
│                  └─────┬──────┘                    │
│                        │                            │
│  ┌──────────┐         │          ┌───────────┐    │
│  │ Weight   │─────────┘          │ Spike Out │    │
│  │ ROM      │                    │ + Neuron ID│    │
│  └──────────┘                    └───────────┘    │
└────────────────────────────────────────────────────┘

3. Supported Configurations¶

3.1 Storage Strategy Comparison¶

Strategy	Neurons	LUTs	BRAM	URAM	Latency
Registers	1–64	N×W	0	0	1 cycle
BRAM	65–16K	~200	1–18	0	N cycles
URAM	16K+	~200	0	1–8	N cycles

3.2 Weight ROM Format Support¶

Format	Extension	Vendor	Use Case
Verilog `$readmemh`	`.v`	Generic	Simulation + synthesis
Xilinx COE	`.coe`	Xilinx/AMD	Vivado Block Memory Gen
Intel MIF	`.mif`	Intel/Altera	Quartus IP Catalog

3.3 Interconnect Auto-Selection¶

Neuron Count	Interconnect	Topology
≤64	Direct wiring	Point-to-point
>64	AER bus	Address-event arbitrated

4. Python API¶

4.1 Storage Recommendation¶

Python

from sc_neurocore.compiler.intelligence import storage_recommendation

rec = storage_recommendation(
    neuron_count=1024,
    state_bits_per_neuron=16,
    has_uram=False,
)
print(rec)
# StorageRecommendation(
#     strategy='bram',
#     neuron_count=1024,
#     total_bits=16384,
#     bram_18k_used=1,
#     bram_36k_used=0,
#     reason='1024 neurons × 16b = 16Kb — using BRAM.'
# )

4.2 Generate BRAM Neuron Array¶

Python

from sc_neurocore.compiler.intelligence import generate_bram_array

verilog = generate_bram_array(
    module_name="sc_neuron_array",
    neuron_count=1024,
    data_width=16,
    state_vars=1,      # 1 for LIF, 2 for Izhikevich
)

with open("sc_neuron_array.v", "w") as f:
    f.write(verilog)

The generated array implements a concrete current-based LIF update in the time-multiplexed datapath:

Verilog

assign v_next = v_curr + (I_global >>> 4) - (v_curr >>> 3);

It is therefore suitable as a compact BRAM-backed LIF array. More detailed biophysical equations should use the equation compiler path, which emits the requested equation-specific datapath instead of this fixed LIF recurrence.

4.3 Generate Weight ROM¶

Python

from sc_neurocore.compiler.intelligence.core import generate_weight_rom

# Random 10×10 weight matrix in Q8.8
import random
weights = [[random.randint(-128, 127) for _ in range(10)] for _ in range(10)]

# Verilog ROM
verilog_rom = generate_weight_rom(weights, "sc_weight_rom", data_width=16)

# Xilinx COE file
coe_rom = generate_weight_rom(
    weights, "sc_weight_rom",
    data_width=16, output_format="coe",
)

# Intel MIF file
mif_rom = generate_weight_rom(
    weights, "sc_weight_rom",
    data_width=16, output_format="mif",
)

with open("sc_weight_rom.v", "w") as f:
    f.write(verilog_rom)
with open("sc_weight_rom.coe", "w") as f:
    f.write(coe_rom)
with open("sc_weight_rom.mif", "w") as f:
    f.write(mif_rom)

4.4 Full Network Compilation¶

Python

from sc_neurocore.compiler.intelligence import (
    storage_recommendation,
    generate_bram_array,
)
from sc_neurocore.compiler.intelligence.core import generate_weight_rom
from sc_neurocore.neurons.equation_builder import from_equations
from sc_neurocore.compiler.equation_compiler import compile_to_verilog

# 1. Compile neuron type
neuron = from_equations(
    "dv/dt = -(v - E_L)/tau_m + I/C",
    threshold="v > -50", reset="v = -65",
    params=dict(E_L=-65, tau_m=10, C=1),
    init=dict(v=-65),
)
neuron_v = compile_to_verilog(neuron, module_name="sc_lif")

# 2. Determine storage
rec = storage_recommendation(512, 16, has_uram=False)
print(f"Strategy: {rec.strategy} ({rec.reason})")

# 3. Generate array
array_v = generate_bram_array(
    neuron_count=512,
    data_width=16,
    state_vars=1,
)

# 4. Generate weight ROM
import random
W = [[random.randint(-50, 50) for _ in range(512)] for _ in range(512)]
rom_v = generate_weight_rom(W, "sc_weights", data_width=16)

# Write all
for name, content in [
    ("sc_lif.v", neuron_v),
    ("sc_neuron_array.v", array_v),
    ("sc_weights.v", rom_v),
]:
    with open(name, "w") as f:
        f.write(content)

5. CLI Usage¶

5.1 Network Compilation via NIR¶

Bash

sc-neurocore compile-nir model.nir --target artix7 -o build/

This auto-detects neuron count and selects the appropriate storage strategy, generating the neuron array, weight ROM, and top-level interconnect.

5.2 Generate Weight ROM Standalone¶

Bash

python -c "
from sc_neurocore.compiler.intelligence.core import generate_weight_rom
import random
W = [[random.randint(-100, 100) for _ in range(64)] for _ in range(64)]

for fmt in ['verilog', 'coe', 'mif']:
    rom = generate_weight_rom(W, 'sc_rom', data_width=16, output_format=fmt)
    ext = {'verilog': 'v', 'coe': 'coe', 'mif': 'mif'}[fmt]
    open(f'sc_rom.{ext}', 'w').write(rom)
    print(f'{fmt}: {len(rom)} bytes')
"

6. Generated Verilog Structure¶

6.1 BRAM Array Module¶

Verilog

// Auto-generated time-multiplexed neuron array: sc_neuron_array
// SC-NeuroCore network-level compilation
// Neurons: 1024, State width: 16b, Pipeline: 1 neuron/cycle

module sc_neuron_array (
    input  wire        clk,
    input  wire        rst,
    input  wire        en,
    input  wire signed [15:0] I_global,
    output wire        spike_out,
    output wire [9:0]  spike_neuron_id,
    output wire        tick_done
);

    (* ram_style = "block" *)
    reg [15:0] state_bram [0:1023];

    reg [9:0]  neuron_idx;
    reg        tick_active;
    reg signed [15:0] v_curr;
    wire signed [15:0] v_next;
    wire       spike_w;

    // Compute datapath (plugged from compiled neuron)
    assign v_next = v_curr + (I_global >>> 4) - (v_curr >>> 3);
    assign spike_w = (v_next > 16'sd16383);

    // ...
endmodule

6.2 Weight ROM (Verilog Format)¶

Verilog

// Auto-generated weight ROM: sc_weight_rom
// 100 entries × 16-bit

module sc_weight_rom (
    input  wire [6:0] addr,
    output reg signed [15:0] data
);
    always @(*) begin
        case (addr)
            7'd0: data = 16'sh001A;
            7'd1: data = 16'shFFE6;
            // ...
        endcase
    end
endmodule

6.3 Xilinx COE Format¶

Text Only

; Auto-generated by SC-NeuroCore
memory_initialization_radix = 16;
memory_initialization_vector =
001A,
FFE6,
0032,
...;

6.4 Intel MIF Format¶

Text Only

-- Auto-generated by SC-NeuroCore
DEPTH = 100;
WIDTH = 16;
ADDRESS_RADIX = DEC;
DATA_RADIX = HEX;
CONTENT
BEGIN
0 : 001A;
1 : FFE6;
2 : 0032;
END;

7. Performance Characteristics¶

7.1 Scaling Analysis¶

Neurons	State Bits	BRAM 18K	Tick Latency (200 MHz)
64	1,024	0 (regs)	0.32 µs
256	4,096	1	1.28 µs
1,024	16,384	1	5.12 µs
4,096	65,536	2	20.5 µs
16,384	262,144	8	81.9 µs
65,536	1,048,576	4 URAM	327.7 µs

7.2 Maximum Network Size by Target¶

Target	BRAM	URAM	Max Neurons (Q8.8)	Max Neurons (Q16.16)
Artix-7 100T	135 × 36Kb	—	~300K	~150K
Kintex UltraScale+	600 × 36Kb	80 × 288Kb	~1.3M	~650K
Versal Premium	967 × 36Kb	463 × 288Kb	~8.4M	~4.2M

7.3 Weight ROM Size Limits¶

Network	Weights	ROM Size (Q8.8)	ROM Size (Q16.16)
100×100 fully connected	10K	20 KB	40 KB
1K×1K fully connected	1M	2 MB	4 MB
1K×1K sparse (10%)	100K	200 KB	400 KB

8. Test Suite and Verification¶

8.1 Storage Recommendation Test¶

Bash

python -c "
from sc_neurocore.compiler.intelligence import storage_recommendation

assert storage_recommendation(32, 16).strategy == 'registers'
assert storage_recommendation(128, 16).strategy == 'bram'
assert storage_recommendation(32000, 16, has_uram=True).strategy == 'uram'
assert storage_recommendation(32000, 16, has_uram=False).strategy == 'bram'
print('Storage recommendation: PASS')
"

8.2 BRAM Array Generation Test¶

Bash

python -c "
from sc_neurocore.compiler.intelligence import generate_bram_array
v = generate_bram_array(neuron_count=256, data_width=16)
assert 'state_bram' in v
assert 'ram_style' in v
assert '[7:0]' in v  # 8-bit neuron index for 256 neurons
print(f'BRAM array: PASS ({len(v)} bytes)')
"

8.3 Weight ROM Cross-Format Consistency¶

Bash

python -c "
from sc_neurocore.compiler.intelligence.core import generate_weight_rom
W = [[100, -50], [25, -75]]

v = generate_weight_rom(W, 'test', data_width=16, output_format='verilog')
c = generate_weight_rom(W, 'test', data_width=16, output_format='coe')
m = generate_weight_rom(W, 'test', data_width=16, output_format='mif')

# All formats should contain the same hex values
assert '0064' in v.lower() or '64' in v  # 100 decimal
assert '0064' in c.lower() or '64' in c
assert '0064' in m.lower() or '64' in m
print('Cross-format consistency: PASS')
"

8.5 Large Network Scaling Test¶

Bash

python -c "
from sc_neurocore.compiler.intelligence import storage_recommendation

for n in [16, 64, 128, 1024, 8192, 32768]:
    rec = storage_recommendation(n, 16)
    print(f'{n:>6} neurons: {rec.strategy:<10} {rec.reason}')
"

8.6 Izhikevich Multi-State Array¶

For neurons with multiple state variables (e.g. Izhikevich with v and u), the BRAM width doubles:

Python

from sc_neurocore.compiler.intelligence import generate_bram_array

# 2 state variables × 16 bits = 32 bits per neuron
v = generate_bram_array(
    neuron_count=512,
    data_width=16,
    state_vars=2,  # v and u
)
print(f"Array size: {len(v)} bytes")

8.7 Sparse Weight Matrix¶

For networks with sparse connectivity (e.g. cortical microcircuits), only non-zero weights are stored:

Python

from sc_neurocore.compiler.intelligence.core import generate_weight_rom

# Sparse 100×100 matrix with ~10% connectivity
import random
W = [[0] * 100 for _ in range(100)]
for _ in range(1000):  # 10% fill
    i, j = random.randint(0, 99), random.randint(0, 99)
    W[i][j] = random.randint(-50, 50)

rom = generate_weight_rom(W, "sc_sparse_rom", data_width=16)
print(f"ROM: {len(rom)} bytes (includes zero entries)")

8.8 Multi-Layer Network Pattern¶

For feed-forward networks with multiple layers:

Python

from sc_neurocore.compiler.intelligence.core import generate_weight_rom

layers = [
    {"src": 64, "dst": 128, "name": "layer_0"},
    {"src": 128, "dst": 64, "name": "layer_1"},
    {"src": 64, "dst": 10, "name": "layer_2"},
]

for layer in layers:
    import random
    W = [[random.randint(-30, 30)
          for _ in range(layer["dst"])]
         for _ in range(layer["src"])]
    rom = generate_weight_rom(W, f"sc_{layer['name']}_rom", data_width=16)
    with open(f"sc_{layer['name']}_rom.v", "w") as f:
        f.write(rom)
    print(f"{layer['name']}: {layer['src']}×{layer['dst']} = "
          f"{layer['src']*layer['dst']} weights")

8.9 Troubleshooting¶

Symptom	Cause	Fix
`synthesis ramstyle` ignored	Wrong vendor	Use `(* ram_style *)` for Xilinx
State corruption	BRAM read-during-write	Check `NO_CHANGE` attribute
Tick never completes	`neuron_idx` overflow	Verify index width matches count
Zero spikes	Threshold too high	Check Q-format threshold encoding
Weight sign error	Unsigned vs signed ROM	Use `signed` in Verilog port

8.10 E2E Pipeline Test¶

Bash

python -m pytest tests/e2e/test_e2e_pipeline.py -v -k "network"

References¶

BRAM inference in Xilinx FPGAs: AMD/Xilinx. "Vivado Design Suite User Guide: Synthesis." UG901, 2024.
UltraRAM user guide: AMD/Xilinx. "UltraScale Architecture Memory Resources." UG573, 2024.
Time-multiplexed neural networks on FPGA: Pani, D. et al. "An FPGA platform for real-time simulation of spiking neuronal networks." Front. Neurosci., 11:90, 2017.
SpiNNaker time-multiplexed neurons: Furber, S.B. et al. "The SpiNNaker Project." Proceedings of the IEEE, 102(5):652–665, 2014.

Network-Level Compilation¶

1. Mathematical Formalism¶

1.1 Storage Strategy Decision¶

1.2 BRAM Tile Estimation¶

1.3 URAM Tile Estimation¶

1.4 Time-Multiplexed Processing¶

1.5 Weight ROM Addressing¶

2. Architecture¶

2.1 Network Compilation Pipeline¶

2.2 Time-Multiplexed Neuron Array¶

3. Supported Configurations¶

3.1 Storage Strategy Comparison¶

3.2 Weight ROM Format Support¶

3.3 Interconnect Auto-Selection¶

4. Python API¶

4.1 Storage Recommendation¶

4.2 Generate BRAM Neuron Array¶

4.3 Generate Weight ROM¶

4.4 Full Network Compilation¶

5. CLI Usage¶

5.1 Network Compilation via NIR¶

5.2 Generate Weight ROM Standalone¶

6. Generated Verilog Structure¶

6.1 BRAM Array Module¶

6.2 Weight ROM (Verilog Format)¶

6.3 Xilinx COE Format¶

6.4 Intel MIF Format¶

7. Performance Characteristics¶

7.1 Scaling Analysis¶

7.2 Maximum Network Size by Target¶

7.3 Weight ROM Size Limits¶

8. Test Suite and Verification¶

8.1 Storage Recommendation Test¶

8.2 BRAM Array Generation Test¶

8.3 Weight ROM Cross-Format Consistency¶

8.5 Large Network Scaling Test¶

8.6 Izhikevich Multi-State Array¶

8.7 Sparse Weight Matrix¶

8.8 Multi-Layer Network Pattern¶

8.9 Troubleshooting¶

8.10 E2E Pipeline Test¶

References¶

Further Reading¶