Fixed-Point Arithmetic for Hardware Deployment¶
Design and verify fixed-point neural networks that map directly to FPGA logic. This tutorial covers the Q8.8 number format used by SC-NeuroCore's hardware path, quantisation-aware training, and systematic verification against the floating-point reference.
Prerequisites: pip install sc-neurocore numpy
1. Why fixed-point?¶
FPGAs have no floating-point units (or expensive ones). Fixed-point arithmetic uses integer ALUs with an implicit binary point:
Q8.8 format (16-bit signed):
bit 15: sign
bits 14-8: integer part (-128 to 127)
bits 7-0: fractional part (resolution = 1/256 ≈ 0.0039)
Example: 0x0180 = 1.5 (1 × 256 + 128 = 384 → 384/256 = 1.5)
SC-NeuroCore's FixedPointLIFNeuron and Verilog sc_lif_neuron.v
both use Q8.8 — the Python model is bit-true, meaning every
intermediate value matches the hardware exactly.
2. The FixedPointLIFNeuron¶
from sc_neurocore import FixedPointLIFNeuron
neuron = FixedPointLIFNeuron()
# Step with explicit Q8.8 parameters
# leak_k=240 ≈ 0.9375, gain_k=16 ≈ 0.0625, threshold=256 = 1.0
spike, v = neuron.step(leak_k=240, gain_k=16, I_t=100)
print(f"Spike: {spike}, V: {v} (Q8 raw)")
print(f"V as float: {v / 256:.4f}")
The internal computation:
V_next = clamp_s16((leak_k * V_mem) >> 8 + (gain_k * I_t) >> 8)
spike = V_next >= THRESHOLD
V_out = 0 if spike else V_next
Every multiply is 16×16→32, then right-shift by 8 (the fractional
bits). clamp_s16 saturates to [-32768, 32767].
3. Float → Q8.8 conversion¶
import numpy as np
def float_to_q8(x):
"""Convert float to Q8.8 signed integer."""
return int(np.clip(np.round(x * 256), -32768, 32767))
def q8_to_float(q):
"""Convert Q8.8 signed integer to float."""
return q / 256.0
# Verify round-trip
for val in [0.0, 1.0, -1.0, 0.5, 0.9375, -0.0039]:
q = float_to_q8(val)
back = q8_to_float(q)
error = abs(back - val)
print(f" {val:+.4f} → Q8={q:+6d} → {back:+.4f} error={error:.4f}")
Maximum quantisation error: 1/(2×256) = 0.00195.
4. Quantisation-aware weight training¶
Train in float, quantise weights to Q8.8 after each update:
from sc_neurocore import VectorizedSCLayer
L = 512
layer = VectorizedSCLayer(n_inputs=50, n_neurons=20, length=L)
# Quantise weights to Q8.8 grid
def quantise_weights(weights):
"""Snap weights to Q8.8 representable values."""
q = np.round(weights * 256).astype(np.int16)
return q.astype(np.float64) / 256.0
# Before quantisation
pre_quant = layer.weights.copy()
layer.weights = quantise_weights(layer.weights)
layer._refresh_packed_weights()
# Measure quantisation error
quant_error = np.abs(pre_quant - layer.weights)
print(f"Mean quantisation error: {quant_error.mean():.6f}")
print(f"Max quantisation error: {quant_error.max():.6f}")
print(f"Unique weight values: {len(np.unique(layer.weights))}")
5. Overflow detection¶
Fixed-point multiply can overflow. Detect and prevent it:
def safe_q8_multiply(a, b):
"""Q8.8 multiply with overflow detection.
a, b: Q8.8 integers (int16 range).
Returns: (a * b) >> 8, clamped to int16.
"""
product = int(a) * int(b) # 32-bit intermediate
shifted = product >> 8
if shifted > 32767:
return 32767 # positive saturation
elif shifted < -32768:
return -32768 # negative saturation
return shifted
# Test boundary cases
cases = [
(32767, 256, "max × 1.0"),
(32767, 32767, "max × max (overflow)"),
(-32768, 256, "min × 1.0"),
(256, 256, "1.0 × 1.0"),
]
for a, b, label in cases:
result = safe_q8_multiply(a, b)
print(f" {label}: {a} × {b} >> 8 = {result} ({q8_to_float(result):.4f})")
6. Systematic float vs fixed-point comparison¶
Compare a full network simulation in both modes:
np.random.seed(42)
N_STEPS = 200
# Float reference model
from sc_neurocore import StochasticLIFNeuron
float_neuron = StochasticLIFNeuron(length=256)
# Fixed-point model
fp_neuron = FixedPointLIFNeuron()
float_spikes = []
fp_spikes = []
voltage_errors = []
currents = np.random.randint(-30, 80, size=N_STEPS)
for I in currents:
# Float path
f_spike, f_state = float_neuron.step(x_value=max(0, int(I)))
float_spikes.append(f_spike)
# Fixed-point path (same parameters in Q8)
q_spike, q_v = fp_neuron.step(leak_k=240, gain_k=16, I_t=int(I))
fp_spikes.append(q_spike)
float_count = sum(float_spikes)
fp_count = sum(fp_spikes)
print(f"Float spikes: {float_count}")
print(f"Q8.8 spikes: {fp_count}")
print(f"Spike count difference: {abs(float_count - fp_count)}")
7. Weight export for FPGA¶
Export trained weights as Verilog $readmemh format:
def export_weights_hex(weights, filename):
"""Export Q8.8 weight matrix as hex file for $readmemh."""
q_weights = np.round(weights * 256).astype(np.int16)
with open(filename, "w") as f:
f.write(f"// Q8.8 weights: {q_weights.shape[0]} × {q_weights.shape[1]}\n")
for row in q_weights:
for val in row:
# Convert signed int16 to unsigned hex representation
unsigned = val & 0xFFFF
f.write(f"{unsigned:04X}\n")
# Export a layer's weights
export_weights_hex(layer.weights, "layer_weights.hex")
# Verify the file
with open("layer_weights.hex") as f:
lines = [l.strip() for l in f if not l.startswith("//")]
print(f"Exported {len(lines)} weight values")
print(f"First 5: {lines[:5]}")
In Verilog, load with:
reg signed [15:0] weights [0:N*M-1];
initial $readmemh("layer_weights.hex", weights);
8. Bit-exact verification protocol¶
The gold standard before FPGA deployment:
def verify_bit_exact(n_steps=1000, seed=42):
"""Verify Python fixed-point model matches expected hardware behaviour."""
np.random.seed(seed)
fp = FixedPointLIFNeuron()
voltages = []
spikes = []
for _ in range(n_steps):
I = np.random.randint(-50, 100)
s, v = fp.step(leak_k=240, gain_k=16, I_t=I)
spikes.append(s)
voltages.append(v)
# Invariant: voltage is always valid int16
assert -32768 <= v <= 32767, f"Voltage overflow: {v}"
# Invariant: after spike, voltage resets to 0
if s:
assert v == 0, f"Post-spike voltage not zero: {v}"
print(f"Bit-exact verification: {n_steps} steps, {sum(spikes)} spikes")
print(f"Voltage range: [{min(voltages)}, {max(voltages)}]")
print(f"All invariants hold ✓")
verify_bit_exact()
What you learned¶
- Q8.8 format: 16-bit signed, 8 integer bits, 8 fractional bits
FixedPointLIFNeuronis bit-true to the Verilog RTL- Quantisation-aware training: snap weights to Q8.8 grid after each update
- Overflow detection: multiply in 32-bit, shift, saturate to int16
- Weight export via
$readmemhhex files for FPGA loading - Bit-exact verification: voltage bounds + post-spike reset invariants
Next steps¶
- Run co-simulation (Tutorial 09) to verify against actual Verilog
- Implement Q4.12 format for higher precision at smaller integer range
- Use
FixedPointBitstreamEncoderfor stochastic encoding in hardware - Synthesise the full network with Yosys and measure LUT/FF usage