Skip to content

Debugging and Profiling SC Networks

Diagnose common failure modes in stochastic computing networks: silent accuracy loss, bitstream correlation, weight collapse, and performance bottlenecks.

Prerequisites: pip install sc-neurocore matplotlib

1. The debugging challenge

SC networks fail silently. A broken conventional DNN produces NaN or Inf — obvious errors. A broken SC network produces plausible-looking but wrong firing rates. The bitstream noise masks bugs.

Common failure modes:

Symptom Likely cause Section
All neurons fire at ~50% Weights collapsed to 0.5 §2
Accuracy drops with more layers Bitstream correlation §3
Output is always 0 or always 1 Degenerate encoding §4
Training doesn't converge Learning rate too high for L §5
Rust engine gives different results Seed mismatch §6

2. Weight collapse detection

Weights drifting to the same value kills network expressivity:

import numpy as np
from sc_neurocore import VectorizedSCLayer

layer = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=256)

def diagnose_weights(weights, name="layer"):
    """Report weight distribution health."""
    mean = weights.mean()
    std = weights.std()
    unique = len(np.unique(np.round(weights, 4)))
    near_zero = (weights < 0.05).sum()
    near_one = (weights > 0.95).sum()
    total = weights.size

    print(f"{name}:")
    print(f"  mean={mean:.4f}  std={std:.4f}  unique={unique}")
    print(f"  near 0: {near_zero}/{total} ({100*near_zero/total:.1f}%)")
    print(f"  near 1: {near_one}/{total} ({100*near_one/total:.1f}%)")

    if std < 0.05:
        print(f"  WARNING: weight collapse (std < 0.05)")
    if near_zero + near_one > 0.5 * total:
        print(f"  WARNING: >50% weights at rails")

diagnose_weights(layer.weights, "Initial")

Fix: Reduce learning rate. Add weight decay toward 0.5. Clip weights to [0.05, 0.95] instead of [0.01, 0.99].

3. Bitstream correlation diagnosis

SC arithmetic assumes independent bitstreams. Correlated streams produce biased results:

from sc_neurocore import BitstreamEncoder

# Bad: same seed → perfectly correlated streams
enc_a = BitstreamEncoder(length=256, seed=42)
enc_b = BitstreamEncoder(length=256, seed=42)  # same seed!

bits_a = enc_a.encode(0.7)
bits_b = enc_b.encode(0.3)

# AND of correlated streams ≠ 0.7 * 0.3
product_corr = (bits_a & bits_b).mean()
print(f"Correlated:   AND mean = {product_corr:.3f} (expected 0.21)")

# Good: different seeds → independent streams
enc_c = BitstreamEncoder(length=256, seed=42)
enc_d = BitstreamEncoder(length=256, seed=137)  # different seed

bits_c = enc_c.encode(0.7)
bits_d = enc_d.encode(0.3)

product_indep = (bits_c & bits_d).mean()
print(f"Independent:  AND mean = {product_indep:.3f} (expected 0.21)")

Detection: Compare observed product vs expected product across many test values. Systematic bias indicates correlation.

Fix: Ensure every encoder/LFSR in the network has a unique seed. The VectorizedSCLayer handles this automatically.

4. Degenerate encoding detection

Values at 0.0 or 1.0 produce all-zero or all-one bitstreams — these carry no information through AND gates:

def check_encoding_range(inputs, name="inputs"):
    """Verify inputs are in valid SC range."""
    zeros = (inputs <= 0.0).sum()
    ones = (inputs >= 1.0).sum()
    near_zero = (inputs < 0.01).sum()
    near_one = (inputs > 0.99).sum()
    total = inputs.size

    print(f"{name}: range [{inputs.min():.4f}, {inputs.max():.4f}]")
    if zeros > 0:
        print(f"  CRITICAL: {zeros} exact zeros (all-zero bitstreams)")
    if ones > 0:
        print(f"  CRITICAL: {ones} exact ones (all-one bitstreams)")
    if near_zero > 0.1 * total:
        print(f"  WARNING: {near_zero} values < 0.01 ({100*near_zero/total:.0f}%)")
    if near_one > 0.1 * total:
        print(f"  WARNING: {near_one} values > 0.99 ({100*near_one/total:.0f}%)")

# Example: raw pixel values often include exact 0s and 1s
test_data = np.random.rand(50)
test_data[0] = 0.0   # bad
test_data[1] = 1.0   # bad
check_encoding_range(test_data, "raw input")

# Fix: clamp to safe range
safe_data = np.clip(test_data, 0.01, 0.99)
check_encoding_range(safe_data, "clamped input")

5. Learning rate vs bitstream length

SC noise standard deviation is 1/√L. If the learning rate is larger than the noise floor, training is unstable:

def recommend_lr(bitstream_length, n_inputs):
    """Recommend learning rate based on SC noise floor."""
    # SC noise std ≈ 1/sqrt(L) per output
    # Gradient noise ≈ noise_std * sqrt(n_inputs)
    noise_std = 1.0 / np.sqrt(bitstream_length)
    grad_noise = noise_std * np.sqrt(n_inputs)

    # LR should be smaller than gradient noise / 10
    recommended = grad_noise / 10
    max_safe = grad_noise / 3

    print(f"L={bitstream_length}, N_in={n_inputs}")
    print(f"  SC noise std:     {noise_std:.4f}")
    print(f"  Gradient noise:   {grad_noise:.4f}")
    print(f"  Recommended LR:   {recommended:.4f}")
    print(f"  Max safe LR:      {max_safe:.4f}")
    return recommended

for L in [64, 128, 256, 512]:
    recommend_lr(L, 50)
    print()

6. Cross-backend verification

Verify Python and Rust produce the same results:

def verify_backends(layer, test_input, tolerance=0.05):
    """Compare Python vs Rust output for the same input."""
    py_out = layer.forward(test_input)

    try:
        from sc_neurocore_engine import DenseLayer as RustDenseLayer
        rust_out = RustDenseLayer(layer.weights, layer.length).forward(test_input)
        max_diff = np.max(np.abs(py_out - rust_out))
        print(f"Python vs Rust max diff: {max_diff:.4f}")
        if max_diff > tolerance:
            divergent = np.where(np.abs(py_out - rust_out) > tolerance)[0]
            print(f"  Divergent neurons: {divergent}")
            print(f"  Python: {py_out[divergent]}")
            print(f"  Rust:   {rust_out[divergent]}")
    except ImportError:
        print("Rust engine not available — skipping backend comparison")

test_input = np.random.uniform(0.1, 0.9, size=50)
verify_backends(layer, test_input)

Note: Stochastic outputs will differ between runs (different LFSR sequences). Compare averaged outputs over multiple runs, not single-run values.

7. Layer-by-layer forward pass inspection

Trace values through the network to find where things go wrong:

def trace_forward(layers, x, names=None):
    """Print statistics at each layer."""
    if names is None:
        names = [f"Layer {i}" for i in range(len(layers))]

    print(f"Input: mean={x.mean():.3f} std={x.std():.3f} "
          f"range=[{x.min():.3f}, {x.max():.3f}]")

    for layer, name in zip(layers, names):
        x = layer.forward(np.clip(x, 0.01, 0.99))
        dead = (x < 0.02).sum()
        saturated = (x > 0.98).sum()
        print(f"{name}: mean={x.mean():.3f} std={x.std():.3f} "
              f"range=[{x.min():.3f}, {x.max():.3f}] "
              f"dead={dead} sat={saturated}")
    return x

layer1 = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=256)
layer2 = VectorizedSCLayer(n_inputs=128, n_neurons=64, length=256)
layer3 = VectorizedSCLayer(n_inputs=64, n_neurons=10, length=256)

x = np.random.uniform(0.1, 0.9, size=50)
trace_forward([layer1, layer2, layer3], x)

Look for: - Mean collapsing to 0.5: weight collapse - Std shrinking to 0: all outputs identical - Many dead/saturated neurons: encoding problems

8. Performance profiling

import time

def profile_layer(layer, n_samples=100):
    """Measure forward pass time."""
    x = np.random.uniform(0.1, 0.9, size=layer.weights.shape[1])

    # Warmup
    for _ in range(5):
        layer.forward(x)

    start = time.perf_counter()
    for _ in range(n_samples):
        layer.forward(x)
    elapsed = time.perf_counter() - start

    samples_per_sec = n_samples / elapsed
    us_per_sample = elapsed / n_samples * 1e6
    print(f"  {layer.weights.shape[1]}{layer.weights.shape[0]}, L={layer.length}: "
          f"{us_per_sample:.0f} μs/sample, {samples_per_sec:.0f} samples/s")

print("Layer performance:")
for n_in, n_out, L in [(50, 128, 256), (128, 64, 256), (50, 128, 512), (50, 128, 1024)]:
    layer = VectorizedSCLayer(n_inputs=n_in, n_neurons=n_out, length=L)
    profile_layer(layer)

Throughput scales as O(n_in × n_out × L / 64) due to packed uint64 operations. If you need more speed, use the Rust engine.

9. Diagnostic checklist

Run before every training session:

def preflight_check(layers, test_input):
    """Run all diagnostics on a network."""
    print("=" * 50)
    print("SC Network Preflight Check")
    print("=" * 50)

    # 1. Input range
    check_encoding_range(test_input, "Input")
    print()

    # 2. Weight health per layer
    for i, layer in enumerate(layers):
        diagnose_weights(layer.weights, f"Layer {i}")
    print()

    # 3. Forward trace
    trace_forward(layers, test_input)
    print()

    # 4. LR recommendation
    recommend_lr(layers[0].length, test_input.shape[0])
    print()

    print("Preflight complete.")

preflight_check([layer1, layer2, layer3], x)

What you learned

  • SC networks fail silently — always run diagnostics before training
  • Weight collapse: std < 0.05 means all weights are nearly identical
  • Bitstream correlation: shared LFSR seeds cause biased arithmetic
  • Degenerate encoding: values at 0 or 1 produce uninformative bitstreams
  • LR must be smaller than SC noise floor (1/√L × √N_in) / 10
  • Trace forward passes to find where signal degrades
  • Profile with time.perf_counter(), compare L and layer sizes

Next steps

  • Add the preflight check to your training script
  • Use BitstreamSpikeRecorder to log full spike trains for offline analysis
  • Compare Rust engine output against Python for your specific network
  • Profile memory usage for large networks (>10K neurons)