Debugging and Profiling SC Networks¶
Diagnose common failure modes in stochastic computing networks: silent accuracy loss, bitstream correlation, weight collapse, and performance bottlenecks.
Prerequisites: pip install sc-neurocore matplotlib
1. The debugging challenge¶
SC networks fail silently. A broken conventional DNN produces NaN or Inf — obvious errors. A broken SC network produces plausible-looking but wrong firing rates. The bitstream noise masks bugs.
Common failure modes:
| Symptom | Likely cause | Section |
|---|---|---|
| All neurons fire at ~50% | Weights collapsed to 0.5 | §2 |
| Accuracy drops with more layers | Bitstream correlation | §3 |
| Output is always 0 or always 1 | Degenerate encoding | §4 |
| Training doesn't converge | Learning rate too high for L | §5 |
| Rust engine gives different results | Seed mismatch | §6 |
2. Weight collapse detection¶
Weights drifting to the same value kills network expressivity:
import numpy as np
from sc_neurocore import VectorizedSCLayer
layer = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=256)
def diagnose_weights(weights, name="layer"):
"""Report weight distribution health."""
mean = weights.mean()
std = weights.std()
unique = len(np.unique(np.round(weights, 4)))
near_zero = (weights < 0.05).sum()
near_one = (weights > 0.95).sum()
total = weights.size
print(f"{name}:")
print(f" mean={mean:.4f} std={std:.4f} unique={unique}")
print(f" near 0: {near_zero}/{total} ({100*near_zero/total:.1f}%)")
print(f" near 1: {near_one}/{total} ({100*near_one/total:.1f}%)")
if std < 0.05:
print(f" WARNING: weight collapse (std < 0.05)")
if near_zero + near_one > 0.5 * total:
print(f" WARNING: >50% weights at rails")
diagnose_weights(layer.weights, "Initial")
Fix: Reduce learning rate. Add weight decay toward 0.5. Clip weights to [0.05, 0.95] instead of [0.01, 0.99].
3. Bitstream correlation diagnosis¶
SC arithmetic assumes independent bitstreams. Correlated streams produce biased results:
from sc_neurocore import BitstreamEncoder
# Bad: same seed → perfectly correlated streams
enc_a = BitstreamEncoder(length=256, seed=42)
enc_b = BitstreamEncoder(length=256, seed=42) # same seed!
bits_a = enc_a.encode(0.7)
bits_b = enc_b.encode(0.3)
# AND of correlated streams ≠ 0.7 * 0.3
product_corr = (bits_a & bits_b).mean()
print(f"Correlated: AND mean = {product_corr:.3f} (expected 0.21)")
# Good: different seeds → independent streams
enc_c = BitstreamEncoder(length=256, seed=42)
enc_d = BitstreamEncoder(length=256, seed=137) # different seed
bits_c = enc_c.encode(0.7)
bits_d = enc_d.encode(0.3)
product_indep = (bits_c & bits_d).mean()
print(f"Independent: AND mean = {product_indep:.3f} (expected 0.21)")
Detection: Compare observed product vs expected product across many test values. Systematic bias indicates correlation.
Fix: Ensure every encoder/LFSR in the network has a unique seed.
The VectorizedSCLayer handles this automatically.
4. Degenerate encoding detection¶
Values at 0.0 or 1.0 produce all-zero or all-one bitstreams — these carry no information through AND gates:
def check_encoding_range(inputs, name="inputs"):
"""Verify inputs are in valid SC range."""
zeros = (inputs <= 0.0).sum()
ones = (inputs >= 1.0).sum()
near_zero = (inputs < 0.01).sum()
near_one = (inputs > 0.99).sum()
total = inputs.size
print(f"{name}: range [{inputs.min():.4f}, {inputs.max():.4f}]")
if zeros > 0:
print(f" CRITICAL: {zeros} exact zeros (all-zero bitstreams)")
if ones > 0:
print(f" CRITICAL: {ones} exact ones (all-one bitstreams)")
if near_zero > 0.1 * total:
print(f" WARNING: {near_zero} values < 0.01 ({100*near_zero/total:.0f}%)")
if near_one > 0.1 * total:
print(f" WARNING: {near_one} values > 0.99 ({100*near_one/total:.0f}%)")
# Example: raw pixel values often include exact 0s and 1s
test_data = np.random.rand(50)
test_data[0] = 0.0 # bad
test_data[1] = 1.0 # bad
check_encoding_range(test_data, "raw input")
# Fix: clamp to safe range
safe_data = np.clip(test_data, 0.01, 0.99)
check_encoding_range(safe_data, "clamped input")
5. Learning rate vs bitstream length¶
SC noise standard deviation is 1/√L. If the learning rate is larger than the noise floor, training is unstable:
def recommend_lr(bitstream_length, n_inputs):
"""Recommend learning rate based on SC noise floor."""
# SC noise std ≈ 1/sqrt(L) per output
# Gradient noise ≈ noise_std * sqrt(n_inputs)
noise_std = 1.0 / np.sqrt(bitstream_length)
grad_noise = noise_std * np.sqrt(n_inputs)
# LR should be smaller than gradient noise / 10
recommended = grad_noise / 10
max_safe = grad_noise / 3
print(f"L={bitstream_length}, N_in={n_inputs}")
print(f" SC noise std: {noise_std:.4f}")
print(f" Gradient noise: {grad_noise:.4f}")
print(f" Recommended LR: {recommended:.4f}")
print(f" Max safe LR: {max_safe:.4f}")
return recommended
for L in [64, 128, 256, 512]:
recommend_lr(L, 50)
print()
6. Cross-backend verification¶
Verify Python and Rust produce the same results:
def verify_backends(layer, test_input, tolerance=0.05):
"""Compare Python vs Rust output for the same input."""
py_out = layer.forward(test_input)
try:
from sc_neurocore_engine import DenseLayer as RustDenseLayer
rust_out = RustDenseLayer(layer.weights, layer.length).forward(test_input)
max_diff = np.max(np.abs(py_out - rust_out))
print(f"Python vs Rust max diff: {max_diff:.4f}")
if max_diff > tolerance:
divergent = np.where(np.abs(py_out - rust_out) > tolerance)[0]
print(f" Divergent neurons: {divergent}")
print(f" Python: {py_out[divergent]}")
print(f" Rust: {rust_out[divergent]}")
except ImportError:
print("Rust engine not available — skipping backend comparison")
test_input = np.random.uniform(0.1, 0.9, size=50)
verify_backends(layer, test_input)
Note: Stochastic outputs will differ between runs (different LFSR sequences). Compare averaged outputs over multiple runs, not single-run values.
7. Layer-by-layer forward pass inspection¶
Trace values through the network to find where things go wrong:
def trace_forward(layers, x, names=None):
"""Print statistics at each layer."""
if names is None:
names = [f"Layer {i}" for i in range(len(layers))]
print(f"Input: mean={x.mean():.3f} std={x.std():.3f} "
f"range=[{x.min():.3f}, {x.max():.3f}]")
for layer, name in zip(layers, names):
x = layer.forward(np.clip(x, 0.01, 0.99))
dead = (x < 0.02).sum()
saturated = (x > 0.98).sum()
print(f"{name}: mean={x.mean():.3f} std={x.std():.3f} "
f"range=[{x.min():.3f}, {x.max():.3f}] "
f"dead={dead} sat={saturated}")
return x
layer1 = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=256)
layer2 = VectorizedSCLayer(n_inputs=128, n_neurons=64, length=256)
layer3 = VectorizedSCLayer(n_inputs=64, n_neurons=10, length=256)
x = np.random.uniform(0.1, 0.9, size=50)
trace_forward([layer1, layer2, layer3], x)
Look for: - Mean collapsing to 0.5: weight collapse - Std shrinking to 0: all outputs identical - Many dead/saturated neurons: encoding problems
8. Performance profiling¶
import time
def profile_layer(layer, n_samples=100):
"""Measure forward pass time."""
x = np.random.uniform(0.1, 0.9, size=layer.weights.shape[1])
# Warmup
for _ in range(5):
layer.forward(x)
start = time.perf_counter()
for _ in range(n_samples):
layer.forward(x)
elapsed = time.perf_counter() - start
samples_per_sec = n_samples / elapsed
us_per_sample = elapsed / n_samples * 1e6
print(f" {layer.weights.shape[1]}→{layer.weights.shape[0]}, L={layer.length}: "
f"{us_per_sample:.0f} μs/sample, {samples_per_sec:.0f} samples/s")
print("Layer performance:")
for n_in, n_out, L in [(50, 128, 256), (128, 64, 256), (50, 128, 512), (50, 128, 1024)]:
layer = VectorizedSCLayer(n_inputs=n_in, n_neurons=n_out, length=L)
profile_layer(layer)
Throughput scales as O(n_in × n_out × L / 64) due to packed uint64 operations. If you need more speed, use the Rust engine.
9. Diagnostic checklist¶
Run before every training session:
def preflight_check(layers, test_input):
"""Run all diagnostics on a network."""
print("=" * 50)
print("SC Network Preflight Check")
print("=" * 50)
# 1. Input range
check_encoding_range(test_input, "Input")
print()
# 2. Weight health per layer
for i, layer in enumerate(layers):
diagnose_weights(layer.weights, f"Layer {i}")
print()
# 3. Forward trace
trace_forward(layers, test_input)
print()
# 4. LR recommendation
recommend_lr(layers[0].length, test_input.shape[0])
print()
print("Preflight complete.")
preflight_check([layer1, layer2, layer3], x)
What you learned¶
- SC networks fail silently — always run diagnostics before training
- Weight collapse: std < 0.05 means all weights are nearly identical
- Bitstream correlation: shared LFSR seeds cause biased arithmetic
- Degenerate encoding: values at 0 or 1 produce uninformative bitstreams
- LR must be smaller than SC noise floor (1/√L × √N_in) / 10
- Trace forward passes to find where signal degrades
- Profile with
time.perf_counter(), compare L and layer sizes
Next steps¶
- Add the preflight check to your training script
- Use
BitstreamSpikeRecorderto log full spike trains for offline analysis - Compare Rust engine output against Python for your specific network
- Profile memory usage for large networks (>10K neurons)