SNN Model Compression¶
Weight pruning, structural pruning, stochastic-aware pruning, and quantization for FPGA cost reduction.
Pruning¶
Three pruning strategies:
prune_weights— Magnitude-based: zero out weights with |w| below threshold. Standard approach.prune_neurons— Structural: remove entire neurons with low firing rates, reducing layer width (not just sparsity).prune_stochastic— SC-specific: score weights by bitstream contribution. Weights near 0 or 1 produce near-deterministic bitstreams (low entropy) and can be replaced with constant gates. Importance =min(p, 1-p) * bitstream_length.
from sc_neurocore.compression import prune_stochastic
# Prune weights contributing <1 popcount bit per inference
pruned, report = prune_stochastic(weights, bitstream_length=256, min_popcount_bits=1.0)
print(f"Sparsity: {report.sparsity:.1%}")
sc_neurocore.compression.pruning
¶
Weight, structural, and stochastic-aware pruning for SNN model compression.
Weight pruning: zero out weights below a magnitude threshold. Structural pruning: remove entire neurons that fire below an activity threshold, reducing layer width. Stochastic pruning: score weights by bitstream contribution — how many popcount bits they contribute per inference. SC-specific.
All methods reduce FPGA resource usage when combined with Projection(weight_threshold=) for runtime sparsity exploitation.
PruningReport
dataclass
¶
Results of a pruning operation.
Source code in src/sc_neurocore/compression/pruning.py
27 28 29 30 31 32 33 34 35 36 | |
prune_weights(weights, threshold=0.01, method='magnitude')
¶
Prune small weights from layer weight matrices.
Parameters¶
weights : list of ndarray Weight matrices for each layer. threshold : float Pruning threshold. Weights with |w| <= threshold are zeroed. method : str 'magnitude' (default): prune by absolute value. 'percentile': treat threshold as percentile (0-100) of weight magnitudes to prune.
Returns¶
(pruned_weights, PruningReport)
Source code in src/sc_neurocore/compression/pruning.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
prune_neurons(weights, firing_rates=None, activity_threshold=0.001)
¶
Structural pruning: remove neurons with low firing rates.
Removes entire rows from weight matrices (output neurons) and corresponding columns from the next layer's weight matrix (input connections). Reduces layer width, not just sparsity.
Parameters¶
weights : list of ndarray Weight matrices [W1, W2, ...] where W_i has shape (n_out, n_in). firing_rates : list of ndarray, optional Per-neuron firing rates for each layer. If None, uses output weight magnitude as a proxy for importance. activity_threshold : float Neurons with firing rate (or weight norm) below this are pruned.
Returns¶
(pruned_weights, PruningReport)
Source code in src/sc_neurocore/compression/pruning.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
prune_stochastic(weights, bitstream_length=256, min_popcount_bits=1.0)
¶
Stochastic-aware pruning: score weights by bitstream contribution.
In SC networks, weight w encodes probability p = clip(|w|, 0, 1). The expected popcount contribution per inference is: contribution = min(p, 1-p) * bitstream_length
Weights that produce nearly-deterministic bitstreams (p near 0 or 1) contribute almost nothing to computation — they can be replaced with constant 0/1 gates, saving AND+popcount hardware.
Parameters¶
weights : list of ndarray Weight matrices (values in [0, 1] for unipolar SC). bitstream_length : int Bitstream length (L). Longer streams = more bits per weight. min_popcount_bits : float Minimum expected popcount contribution to keep a weight. Weights contributing fewer bits than this are zeroed.
Returns¶
(pruned_weights, PruningReport)
Source code in src/sc_neurocore/compression/pruning.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
Quantization¶
sc_neurocore.compression.quantization
¶
Quantize weights and delays for reduced hardware precision.
Weight quantization: reduce from float64 to fixed-point with configurable bit width. Fewer bits = smaller BRAM and simpler multiplier circuits.
Delay quantization: round continuous delays to integer steps or coarser grids. Fewer delay levels = smaller delay buffers on FPGA.
quantize_weights(weights, bits=8, symmetric=True)
¶
Quantize weight matrices to fixed-point with given bit width.
Parameters¶
weights : list of ndarray Float weight matrices. bits : int Target bit width (default 8). Range: [2, 16]. symmetric : bool Symmetric quantization around zero (default True).
Returns¶
list of ndarray Quantized weights (still float dtype but with discrete values).
Source code in src/sc_neurocore/compression/quantization.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
quantize_delays(delays, resolution=1, max_delay=None)
¶
Quantize continuous delays to integer grid.
Parameters¶
delays : ndarray Continuous delay values. resolution : int Delay step size (default 1). Resolution=2 means delays are rounded to {0, 2, 4, 6, ...}, halving the buffer depth. max_delay : int, optional Clamp delays to this maximum.
Returns¶
ndarray of int Quantized integer delays.
Source code in src/sc_neurocore/compression/quantization.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |