Skip to content

Fixed-Point Precision Modes

SC-NeuroCore supports 11 named fixed-point precision modes for Verilog RTL code generation, spanning 8-bit through 36-bit, plus arbitrary custom formats via the API. Each mode trades off between integer range (the largest values representable), fractional resolution (the finest distinction between values), and hardware resource cost (DSP/gate utilisation).

Quick Reference — All 11 Modes

# Mode CLI Key Bits Integer Range Resolution Best For
1 Q1.7 q17 8 [-1, +0.99] 1/128 Ultra-compact (Loihi/TrueNorth-class)
2 Q8.8 q88 16 [-128, +127.996] 1/256 mV-scale models (default)
3 Q4.12 q412 16 [-8, +7.9998] 1/4096 Normalised dynamics (FHN, Theta)
4 Q1.15 q115 16 [-1, +1.0] 1/32768 ARM CMSIS-DSP standard
5 Q9.9 q99 18 [-256, +255.998] 1/512 DSP48-native (Xilinx/Intel/Lattice)
6 Q12.12 q1212 24 [-2048, +2047.999] 1/4096 Loihi-2 native / audio-grade
7 Q14.13 q1413 27 [-8192, +8191.999] 1/8192 Intel Stratix 27×27 DSP
8 Q20.12 q2012 32 [-524288, +524287] 1/4096 Network-level accumulation
9 Q16.16 q1616 32 [-32768, +32767] 1/65536 Gold standard
10 Q8.24 q824 32 [-128, +128] 1/16.7M Ultra-precision (EP training)
11 Q18.18 q1818 36 [-131072, +131072] 1/262144 UltraScale DSP48E2-native

Mathematical Foundation

A Qm.n fixed-point number uses: - 1 sign bit (two's complement) - m integer bits (determining range) - n fractional bits (determining resolution)

The value of a raw integer r in Qm.n format is:

Text Only
value = r / 2^n

Encoding a float to Q-format:

Text Only
raw = round(value × 2^n)

Range of representable values:

Text Only
min = -2^(m+n-1) / 2^n = -2^(m-1)
max = (2^(m+n-1) - 1) / 2^n ≈ 2^(m-1) - 2^(-n)

Tier-by-Tier Guide

8-Bit Tier: Q1.7

The most compact format — 4× neuron density compared to Q8.8. Suitable for models with all parameters normalised to [-1, +1].

Python
verilog = neuron.to_verilog(module_name="sc_lif", data_width=8, fraction=7)

Targets: IBM TrueNorth, BrainChip Akida, QuickLogic EOS S3. Limitation: mV-scale models (v_rest=-65) will overflow.

16-Bit Tier: Q8.8, Q4.12, Q1.15

Mode Use Case Key Feature
Q8.8 mV-scale neuron models (LIF, HH) Default; ±128 range covers physiological voltages
Q4.12 Normalised dynamics (FHN, Theta, GLIF) 16× finer precision than Q8.8
Q1.15 ARM CMSIS-DSP interop, SpiNNaker 2 Industry standard fractional format
Bash
python -m sc_neurocore.neurons compile lif -p q88 -o lif.v
python -m sc_neurocore.neurons compile lif -p q412 -o lif_hp.v
python -m sc_neurocore.neurons compile lif -p q115 -o lif_arm.v

18-Bit Tier: Q9.9 — The Universal DSP Format

Q9.9 uses exactly the native width of DSP hard multipliers across 5 FPGA vendors:

Vendor DSP Block Multiplier Q9.9 Fits?
Xilinx DSP48E1/A1 18×18 ✅ 100%
Intel Variable 18×18 ✅ 100%
Lattice MULT18X18D 18×18 ✅ 100%
Gowin MULT18X18 18×18 ✅ 100%
Microchip MACC 18×18 ✅ 100%
Bash
python -m sc_neurocore.neurons compile lif -p q99 -o lif_dsp.v

24-Bit Tier: Q12.12

Matches Intel Loihi 2's native 24-bit membrane potential format and Xilinx Versal's DSP58 B-port width (24 bits). Also matches Achronix Speedster7t's 24×24 MLP blocks.

Bash
python -m sc_neurocore.neurons compile lif -p q1212 -o lif_loihi.v

27-Bit Tier: Q14.13

Exploits Intel's 27×27 variable-precision DSP blocks found in Arria 10, Stratix 10, and Agilex FPGAs. Provides ±8192 range with 1/8192 resolution.

Bash
python -m sc_neurocore.neurons compile lif -p q1413 -o lif_stratix.v

32-Bit Tier: Q20.12, Q16.16, Q8.24

Mode Use Case Key Feature
Q20.12 Network-level accumulation ±524K range with Q4.12 precision
Q16.16 Gold standard Widest range + high precision
Q8.24 Equilibrium propagation training Ultra-fine gradients (dt=1µs)
Bash
python -m sc_neurocore.neurons compile lif -p q2012 -o lif_net.v
python -m sc_neurocore.neurons compile lif -p q1616 -o lif_hd.v
python -m sc_neurocore.neurons compile lif -p q824  -o lif_ep.v

36-Bit Tier: Q18.18

Uses the full product width of Xilinx UltraScale DSP48E2 blocks (27×18 = 45-bit product, of which 36 bits are the Q18.18 result). Provides ±131K range with sub-microsecond resolution.

Bash
python -m sc_neurocore.neurons compile lif -p q1818 -o lif_us.v

Custom Formats via API

The compiler accepts any (data_width, fraction) pair — the 11 named modes are CLI shortcuts, not limitations:

Python
# Arbitrary format: Q6.10 (16-bit, 10 fractional)
verilog = neuron.to_verilog(
    module_name="sc_lif_custom",
    data_width=16, fraction=10,
)

# Ultra-wide: Q32.32 (64-bit)
verilog = neuron.to_verilog(
    module_name="sc_lif_64",
    data_width=64, fraction=32,
)

Block-Floating Pilot via quantizer API

Quantizer and adaptive-precision surfaces also parse block-floating formats such as BFP16E3X32:

Python
from sc_neurocore.compiler.quantizer import (
    quantize_block_floating,
    dequantize_block_floating,
)

weights = np.array([[0.1, 0.2], [0.3, 0.4]])
q, exponents = quantize_block_floating(weights, fmt="BFP16E3X32")
restored = dequantize_block_floating(q, exponents, fmt="BFP16E3X32")

In this codepath, adaptive precision emits manifest metadata (mantissa_bits, exponent_bits, block_size) alongside fixed-point datapath emission. The biased exponent range uses every representable exponent code; for BFP16E3X32, exponent bias is 3, exponent codes are [0, 7], and the unbiased range is [-3, +4]. The compiler manifest also records the maximum signed mantissa magnitude 32767, minimum quantum 0.125, maximum absolute value 524272.0, and the contiguous flattened block-alignment rule that downstream emitters must preserve.

Block-Floating Dense Deployment Path

Dense layers can be compiled into block-floating weights with fixed-point Q16.16 inputs and saturated Q16.16 outputs:

Python
from sc_neurocore.compiler.quantizer import compile_dense_block_floating

compiled = compile_dense_block_floating(weights, fmt="BFP16E3X32")
outputs_q1616, overflow = compiled.forward_with_overflow(inputs)

This path is wired across the same deployment surfaces as the mixed fixed-point path:

  • Python: CompiledBlockFloatingDense stores mantissas, shared exponents, reconstructed deployment weights, Q16.16 output saturation, and manifests.
  • Rust: sc_neurocore_engine::ir::qformat::block_floating_dense_q16 mirrors the shared-exponent integer MAC, shape validation, mantissa/exponent bounds, and saturation behaviour.
  • HDL: hdl/sc_block_floating_dense.v provides a synchronous RTL reference with explicit dynamic exponent shifts, per-output overflow telemetry, per-output conservative absolute-bound telemetry (abs_bounds_q1616), aggregate overflow, and saturated Q16.16 outputs.

Benchmark and synthesis evidence from 2026-06-04 is committed under benchmarks/results/local_python_2026-06-04_block_floating_dense.json, benchmarks/results/local_rust_2026-06-04_block_floating_dense.json, and hdl/reports/yosys_block_floating_dense_2026-06-04.json.

The block-floating HDL overflow_vector uses the same lane convention as the mixed fixed-point dense path: bit i identifies output channel i, and the aggregate overflow line is asserted when any channel saturates. abs_bounds_q1616[i] is the unsigned conservative absolute Q16.16 bound for the same output channel and is intentionally nonzero for cancellation cases where the realised saturated output is zero.

Mixed Q8.8 / Q16.16 Weight-Accumulator Contract

The quantiser also exposes the mixed fixed-point contract used by hardware compiler paths that keep stored weights compact while widening the accumulation datapath:

Python
from sc_neurocore.compiler.quantizer import (
    QFormatMixed,
    dequantize_weights,
    quantize_weights,
)

fmt = QFormatMixed()  # Q8.8 weights, Q16.16 accumulator, per-tensor scale
q_weights, tensor_scale = quantize_weights(weights, fmt=fmt)
restored = dequantize_weights(q_weights, fmt=fmt, scale=tensor_scale)

For QFormatMixed, quantize_weights returns both the stored integer tensor and the scale multiplier required to reconstruct the original values. The default path maximises the Q8.8 integer dynamic range per tensor and carries the deterministic scale metadata needed by the wider Q16.16 accumulator path. Set scale_per_tensor=False only when the canonical Q8.8 scale must be preserved exactly for legacy parity.

Mixed Dense Deployment Path

Dense layers can be compiled into the same mixed contract directly:

Python
from sc_neurocore.compiler.quantizer import QFormatMixed, compile_dense_mixed_precision

compiled = compile_dense_mixed_precision(weights, fmt=QFormatMixed())
outputs_q1616, overflow = compiled.forward_with_overflow(inputs)

This path is wired across three implementation surfaces:

  • Python: CompiledMixedDense stores Q8.8 weights, Q16.16 accumulator metadata, exact signed saturation, and deterministic deployment manifests.
  • Rust: sc_neurocore_engine::ir::qformat::mixed_dense_q88_q1616 mirrors the canonical integer MAC, arithmetic shift, shape validation, and saturation behaviour.
  • HDL: hdl/sc_mixed_precision_dense.v provides a synchronous RTL reference with per-output overflow telemetry, per-output conservative absolute-bound telemetry (abs_bounds_q1616), aggregate overflow, and saturated Q16.16 outputs.

Benchmark and synthesis evidence from 2026-06-04 is committed under benchmarks/results/local_python_2026-06-04_mixed_dense.json, benchmarks/results/local_rust_2026-06-04_mixed_dense.json, and hdl/reports/yosys_mixed_precision_dense_2026-06-04.json.

The HDL overflow_vector is lane-aligned with the Python/Rust overflow masks: bit i is asserted only when output channel i saturates to the signed Q16.16 minimum or maximum code. The aggregate overflow output is the OR of that vector for consumers that only need a single anomaly line. The HDL abs_bounds_q1616 vector uses the same lane order and carries unsigned 64-bit conservative absolute Q16.16 bounds, matching the Python PrecisionEnvelopeReport.abs_bound_codes and Rust abs_bounds_q1616 telemetry.

For live hardware deployments, the same Q8.8, Q16.16, and block-floating encoded words can be placed behind MMIOUpdateSpec parameter banks instead of being hardcoded into logic. The control window stages bank_select, entry_index, write_data_lo, and optional write_data_hi, then commits with one update_valid|commit write. This keeps precision updates reproducible and lets a controller adjust weights or phase-coupling parameters without a new FPGA synthesis run.

Precision Trap Reports and Hardware Latch

Both compiled dense deployment paths expose a trap report method that turns transient overflow flags into deterministic telemetry:

Python
report = compiled.precision_trap_report(inputs)
assert report.manifest()["overflow_count"] == 0

The report records the output format, output count, overflow count, and whether saturation reached the minimum or maximum representable code. Use this host report when validating a weight package before deployment or when comparing hardware telemetry against the Python reference.

The Rust mirror exposes the same contract through MixedDenseResult::precision_trap_report(), including the exact overflow_count generated during the saturating integer MAC. The HDL side provides hdl/sc_precision_overflow_trap.v, a synchronous sticky latch for the overflow lines emitted by sc_mixed_precision_dense and sc_block_floating_dense. clear_trap is host-controlled and dominates a concurrent overflow pulse, so software can acknowledge an anomaly without a stale vector immediately reappearing in the same cycle.

Trap benchmark and synthesis evidence from 2026-06-04 is committed under benchmarks/results/local_python_2026-06-04_precision_traps.json, benchmarks/results/local_rust_2026-06-04_precision_traps.json, and hdl/reports/yosys_precision_overflow_trap_2026-06-04.json.

Precision Envelope Reports and Predeployment Guard

Trap reports describe what saturated after an operation. Envelope reports add a conservative predeployment bound for the same workload:

Python
report = compiled.precision_envelope_report(inputs)
if not report.conservative_overflow_free:
    raise ValueError("compiled dense workload exceeds the signed output envelope")

The envelope report stores the realised saturated output codes, the realised overflow mask, and a per-output absolute bound in output-format integer codes. observed_overflow_free answers whether this exact input vector saturated. conservative_overflow_free answers whether the absolute-product envelope is inside the symmetric signed output range, so cancellation in one workload cannot hide a dangerous weight/input package.

The Rust mirror exposes the same summary through MixedDenseResult::precision_envelope_report(). The dense HDL references also export per-output abs_bounds_q1616 lanes so firmware can compare hardware runtime telemetry against Python/Rust envelope reports without reconstructing the MAC offline. The HDL side additionally provides hdl/sc_precision_envelope_guard.v, a synchronous per-output guard that checks absolute bounds against the output Q-domain and reports a violation vector.

Envelope benchmark and synthesis evidence from 2026-06-04 is committed under benchmarks/results/local_python_2026-06-04_precision_envelopes.json, benchmarks/results/local_rust_2026-06-04_precision_envelopes.json, and hdl/reports/yosys_precision_envelope_guard_2026-06-04.json.

CLI Usage

Compiling with Precision Selection

Bash
# Default Q8.8
python -m sc_neurocore.neurons compile lif -o sc_lif.v

# Any of the 11 named modes
python -m sc_neurocore.neurons compile lif -p q1212 -o sc_lif_24.v

# Hardware target (auto-selects optimal precision)
python -m sc_neurocore.neurons compile lif --target artix7 -o sc_lif_fpga.v

Precision Diagnostics

The precision subcommand analyses a model across all 11 modes, showing how each parameter encodes, with overflow/underflow warnings and a recommendation:

Bash
python -m sc_neurocore.neurons precision lif

Output (abridged):

Text Only
Precision analysis for: LIF
========================================================================

Q1.7 (8-bit, 7 frac):
  ⚠ Underflow: v_rest=-65.0 below Q1.7 min=-1.0000

Q8.8 (16-bit, 8 frac):
  All parameters fit ✓

Q9.9 (18-bit, 9 frac):
  All parameters fit ✓

Q12.12 (24-bit, 12 frac):
  All parameters fit ✓

========================================================================
Compatible modes: Q8.8, Q9.9, Q12.12, Q14.13, Q20.12, Q16.16, Q8.24, Q18.18
Recommendation: Q8.8 (smallest compatible format)
  For max precision: Q8.24

Overflow and Rounding Modes

Precision modes can be combined with overflow and rounding settings. See the Hardware Profiles Guide for full details.

Bash
# Q8.8 with banker's rounding (IEEE 754)
python -m sc_neurocore.neurons compile lif -p q88 --rounding bankers -o lif.v

# Q16.16 with overflow trapping (safety-critical)
python -m sc_neurocore.neurons compile lif -p q1616 --overflow trap -o lif.v

Programmatic API

The Q88 dataclass (supports all precisions despite the name) provides compile-time diagnostics:

Python
from sc_neurocore.compiler.equation_compiler import Q88

# Create any precision
q = Q88(data_width=18, fraction=9)  # Q9.9

# Properties
print(q.integer_bits)   # 8
print(q.max_value)      # 255.998
print(q.min_value)      # -256.0
print(q.resolution)     # 0.00195

# With overflow and rounding
q = Q88(data_width=24, fraction=12, overflow="wrap", rounding="nearest")
print(q.overflow)    # "wrap"
print(q.rounding)    # "nearest"

# Unsigned Q-format
q = Q88(data_width=16, fraction=8, signed=False)
print(q.min_value)   # 0.0
print(q.max_value)   # 255.996 (double the positive range)

# Range checking
warnings = q.check_range(-65.0, label="v_rest")

# Full precision report
report = q.precision_report(
    dt=0.001,
    params={"v_rest": -65.0, "tau_m": 10.0},
)
print(report)

Arithmetic Operations in Generated Verilog

Multiplication

All multiplications widen to 2×DW bits, then truncate (with configurable rounding) back to DW bits:

Verilog
// a * b in Q8.8 → 32-bit product, then truncate back to 16-bit
wire signed [31:0] _mul0 = a * b;
wire signed [15:0] _t0 = (_mul0 >>> 8);  // truncate rounding

Division by Constant

Division by a known constant uses reciprocal multiplication (more precise and resource-efficient than hardware division):

Verilog
// a / 10.0 → a * (1/10 in Q8.8) = a * 26
wire signed [31:0] _mul0 = a * 16'sd26;
wire signed [15:0] _t0 = (_mul0 >>> 8);

Threshold Detection (Look-Ahead)

The threshold comparison uses v_next (the combinational next-state value) rather than v_reg (the 1-cycle-old register value):

Verilog
// Look-ahead: check v_NEXT, not v_reg
if ((v_next > (-16'sd12800))) begin
    spike_out <= 1'b1;
    v_reg <= P_V_REST;
end

Decision Flowchart

flowchart TD
    A["New Model"] --> B{"What hardware?"}
    B -->|"Known FPGA"| C["Use --target flag"]
    B -->|"Generic/ASIC"| D{"max(|param|) > 128?"}
    D -->|Yes| E["Q16.16 or Q20.12"]
    D -->|No| F{"max(|param|) > 8?"}
    F -->|Yes| G["Q8.8 or Q9.9"]
    F -->|No| H{"dt < 0.004?"}
    H -->|Yes| I["Q4.12 or Q16.16"]
    H -->|No| J["Q4.12"]

    style C fill:#e8f5e9
    style E fill:#e1f5fe
    style G fill:#e1f5fe
    style I fill:#fff9c4
    style J fill:#e8f5e9

Verified Co-Simulation Results

All mV-range modes achieve 0.0% Python↔Verilog spike count gap at I=50.0, 200 steps for linear models:

Mode LIF Lapicque Resonate-Fire
Q8.8 (16-bit) 200/200 200/200 200/200
Q9.9 (18-bit) 200/200 200/200 200/200
Q12.12 (24-bit) 200/200 200/200 200/200
Q14.13 (27-bit) 200/200 200/200 200/200
Q20.12 (32-bit) 200/200 200/200 200/200
Q16.16 (32-bit) 200/200 200/200 200/200
Q8.24 (32-bit) 200/200 200/200 200/200
Q18.18 (36-bit) 200/200 200/200 200/200

Further Reading