Skip to content

Tutorial 44: SNN Model Compression

Reduce FPGA resource usage through pruning, quantization, and automatic optimization. Make networks fit on smaller, cheaper FPGAs.

1. Weight Pruning

import numpy as np
from sc_neurocore.compression import prune_weights

weights = [np.random.randn(16, 8) * 0.3]
pruned, report = prune_weights(weights, threshold=0.1)
print(f"Sparsity: {report.sparsity:.1%}")

2. Structural Pruning

from sc_neurocore.compression import prune_neurons

pruned, report = prune_neurons(weights, activity_threshold=0.05)
print(f"Removed {report.pruned_neurons} neurons")

3. Weight Quantization

from sc_neurocore.compression.quantization import quantize_weights
q8 = quantize_weights(weights, bits=8)

4. Delay Quantization

from sc_neurocore.compression.quantization import quantize_delays
delays = np.array([1.3, 2.7, 4.1])
quantized = quantize_delays(delays, resolution=2)

5. Auto-Optimize for Target

from sc_neurocore.optimizer import fit_to_target
result = fit_to_target([(32, 16), (16, 8)], weights, target="ice40")
print(result.summary())

Further Reading