Fixed-Point Arithmetic for Hardware Deployment¶

Design and verify fixed-point neural networks that map directly to FPGA logic. This tutorial covers the Q8.8 number format used by SC-NeuroCore's hardware path, quantisation-aware training, and systematic verification against the floating-point reference.

Prerequisites: pip install sc-neurocore numpy

1. Why fixed-point?¶

FPGAs have no floating-point units (or expensive ones). Fixed-point arithmetic uses integer ALUs with an implicit binary point:

Q8.8 format (16-bit signed):
  bit 15: sign
  bits 14-8: integer part (-128 to 127)
  bits 7-0: fractional part (resolution = 1/256 ≈ 0.0039)

  Example: 0x0180 = 1.5 (1 × 256 + 128 = 384 → 384/256 = 1.5)

SC-NeuroCore's FixedPointLIFNeuron and Verilog sc_lif_neuron.v both use Q8.8 — the Python model is bit-true, meaning every intermediate value matches the hardware exactly.

2. The FixedPointLIFNeuron¶

from sc_neurocore import FixedPointLIFNeuron

neuron = FixedPointLIFNeuron()

# Step with explicit Q8.8 parameters
# leak_k=240 ≈ 0.9375, gain_k=16 ≈ 0.0625, threshold=256 = 1.0
spike, v = neuron.step(leak_k=240, gain_k=16, I_t=100)
print(f"Spike: {spike}, V: {v} (Q8 raw)")
print(f"V as float: {v / 256:.4f}")

The internal computation:

V_next = clamp_s16((leak_k * V_mem) >> 8 + (gain_k * I_t) >> 8)
spike  = V_next >= THRESHOLD
V_out  = 0 if spike else V_next

Every multiply is 16×16→32, then right-shift by 8 (the fractional bits). clamp_s16 saturates to [-32768, 32767].

3. Float → Q8.8 conversion¶

import numpy as np

def float_to_q8(x):
    """Convert float to Q8.8 signed integer."""
    return int(np.clip(np.round(x * 256), -32768, 32767))

def q8_to_float(q):
    """Convert Q8.8 signed integer to float."""
    return q / 256.0

# Verify round-trip
for val in [0.0, 1.0, -1.0, 0.5, 0.9375, -0.0039]:
    q = float_to_q8(val)
    back = q8_to_float(q)
    error = abs(back - val)
    print(f"  {val:+.4f} → Q8={q:+6d} → {back:+.4f}  error={error:.4f}")

Maximum quantisation error: 1/(2×256) = 0.00195.

4. Quantisation-aware weight training¶

Train in float, quantise weights to Q8.8 after each update:

from sc_neurocore import VectorizedSCLayer

L = 512
layer = VectorizedSCLayer(n_inputs=50, n_neurons=20, length=L)

# Quantise weights to Q8.8 grid
def quantise_weights(weights):
    """Snap weights to Q8.8 representable values."""
    q = np.round(weights * 256).astype(np.int16)
    return q.astype(np.float64) / 256.0

# Before quantisation
pre_quant = layer.weights.copy()
layer.weights = quantise_weights(layer.weights)
layer._refresh_packed_weights()

# Measure quantisation error
quant_error = np.abs(pre_quant - layer.weights)
print(f"Mean quantisation error: {quant_error.mean():.6f}")
print(f"Max quantisation error:  {quant_error.max():.6f}")
print(f"Unique weight values:    {len(np.unique(layer.weights))}")

5. Overflow detection¶

Fixed-point multiply can overflow. Detect and prevent it:

def safe_q8_multiply(a, b):
    """Q8.8 multiply with overflow detection.

    a, b: Q8.8 integers (int16 range).
    Returns: (a * b) >> 8, clamped to int16.
    """
    product = int(a) * int(b)  # 32-bit intermediate
    shifted = product >> 8
    if shifted > 32767:
        return 32767  # positive saturation
    elif shifted < -32768:
        return -32768  # negative saturation
    return shifted

# Test boundary cases
cases = [
    (32767, 256, "max × 1.0"),
    (32767, 32767, "max × max (overflow)"),
    (-32768, 256, "min × 1.0"),
    (256, 256, "1.0 × 1.0"),
]
for a, b, label in cases:
    result = safe_q8_multiply(a, b)
    print(f"  {label}: {a} × {b} >> 8 = {result} ({q8_to_float(result):.4f})")

6. Systematic float vs fixed-point comparison¶

Compare a full network simulation in both modes:

np.random.seed(42)
N_STEPS = 200

# Float reference model
from sc_neurocore import StochasticLIFNeuron
float_neuron = StochasticLIFNeuron(length=256)

# Fixed-point model
fp_neuron = FixedPointLIFNeuron()

float_spikes = []
fp_spikes = []
voltage_errors = []

currents = np.random.randint(-30, 80, size=N_STEPS)

for I in currents:
    # Float path
    f_spike, f_state = float_neuron.step(x_value=max(0, int(I)))
    float_spikes.append(f_spike)

    # Fixed-point path (same parameters in Q8)
    q_spike, q_v = fp_neuron.step(leak_k=240, gain_k=16, I_t=int(I))
    fp_spikes.append(q_spike)

float_count = sum(float_spikes)
fp_count = sum(fp_spikes)
print(f"Float spikes:  {float_count}")
print(f"Q8.8 spikes:   {fp_count}")
print(f"Spike count difference: {abs(float_count - fp_count)}")

7. Weight export for FPGA¶

Export trained weights as Verilog $readmemh format:

def export_weights_hex(weights, filename):
    """Export Q8.8 weight matrix as hex file for $readmemh."""
    q_weights = np.round(weights * 256).astype(np.int16)
    with open(filename, "w") as f:
        f.write(f"// Q8.8 weights: {q_weights.shape[0]} × {q_weights.shape[1]}\n")
        for row in q_weights:
            for val in row:
                # Convert signed int16 to unsigned hex representation
                unsigned = val & 0xFFFF
                f.write(f"{unsigned:04X}\n")

# Export a layer's weights
export_weights_hex(layer.weights, "layer_weights.hex")

# Verify the file
with open("layer_weights.hex") as f:
    lines = [l.strip() for l in f if not l.startswith("//")]
print(f"Exported {len(lines)} weight values")
print(f"First 5: {lines[:5]}")

In Verilog, load with:

reg signed [15:0] weights [0:N*M-1];
initial $readmemh("layer_weights.hex", weights);

8. Bit-exact verification protocol¶

The gold standard before FPGA deployment:

def verify_bit_exact(n_steps=1000, seed=42):
    """Verify Python fixed-point model matches expected hardware behaviour."""
    np.random.seed(seed)
    fp = FixedPointLIFNeuron()

    voltages = []
    spikes = []
    for _ in range(n_steps):
        I = np.random.randint(-50, 100)
        s, v = fp.step(leak_k=240, gain_k=16, I_t=I)
        spikes.append(s)
        voltages.append(v)

        # Invariant: voltage is always valid int16
        assert -32768 <= v <= 32767, f"Voltage overflow: {v}"
        # Invariant: after spike, voltage resets to 0
        if s:
            assert v == 0, f"Post-spike voltage not zero: {v}"

    print(f"Bit-exact verification: {n_steps} steps, {sum(spikes)} spikes")
    print(f"Voltage range: [{min(voltages)}, {max(voltages)}]")
    print(f"All invariants hold ✓")

verify_bit_exact()

What you learned¶

Q8.8 format: 16-bit signed, 8 integer bits, 8 fractional bits
FixedPointLIFNeuron is bit-true to the Verilog RTL
Quantisation-aware training: snap weights to Q8.8 grid after each update
Overflow detection: multiply in 32-bit, shift, saturate to int16
Weight export via $readmemh hex files for FPGA loading
Bit-exact verification: voltage bounds + post-spike reset invariants

Next steps¶

Run co-simulation (Tutorial 09) to verify against actual Verilog
Implement Q4.12 format for higher precision at smaller integer range
Use FixedPointBitstreamEncoder for stochastic encoding in hardware
Synthesise the full network with Yosys and measure LUT/FF usage