Tutorial 27: Fault Tolerance — SC's Fundamental Advantage¶
Stochastic computing has inherent fault tolerance: a single bit-flip in a bitstream changes the encoded probability by only $1/L$. In contrast, a bit-flip in a fixed-point register can corrupt the MSB and cause catastrophic error. This property makes SC circuits naturally suited for radiation-hardened, memristive, and other unreliable hardware substrates.
Why SC is Fault-Tolerant¶
In fixed-point arithmetic, bit position determines significance. A flip in bit 15 (MSB) of a Q8.8 value changes the result by 128.0. A flip in bit 0 (LSB) changes it by 0.004. The expected error from a random bit-flip is enormous.
In SC, every bit has equal weight: $1/L$. A flip in any position changes the encoded probability by exactly $1/L$. With $L=1024$, the worst-case single-bit error is 0.001 — always small, regardless of which bit flipped.
| Metric | Fixed-point Q8.8 | SC (L=1024) |
|---|---|---|
| Worst-case single-bit error | 128.0 | 0.001 |
| Expected single-bit error | ~32.0 | 0.001 |
| Graceful degradation | No (MSB flip = catastrophe) | Yes (linear) |
| Error at 10% bit-flip rate | ~50-100% | ~10% |
1. SC vs Fixed-Point Under Bit Flips¶
import numpy as np
from sc_neurocore.utils.bitstreams import generate_bernoulli_bitstream, bitstream_to_probability
from sc_neurocore.utils.fault_injection import FaultInjector
p_true = 0.7
N_TRIALS = 100
for error_rate in [0.01, 0.05, 0.10, 0.20]:
sc_errors = []
fp_errors = []
for _ in range(N_TRIALS):
# SC: flip bits in a 1024-bit stream
bs = generate_bernoulli_bitstream(p_true, 1024)
corrupted = FaultInjector.inject_bit_flips(bs, error_rate)
sc_errors.append(abs(bitstream_to_probability(corrupted) - p_true))
# Fixed-point Q8.8: flip bits in a 16-bit register
q_val = int(round(p_true * 256))
bits = q_val
for pos in range(16):
if np.random.random() < error_rate:
bits ^= (1 << pos)
if bits >= 32768:
bits -= 65536
fp_errors.append(abs(bits / 256.0 - p_true))
print(f"Error rate {error_rate*100:4.0f}%: "
f"SC={np.mean(sc_errors):.4f} FP={np.mean(fp_errors):.4f} "
f"Ratio={np.mean(fp_errors)/max(np.mean(sc_errors),1e-9):.0f}x")
Typical output shows SC error 10-100x smaller than fixed-point at every noise level.
2. Degradation Curves¶
Plot how accuracy degrades with increasing fault rate:
import numpy as np
from sc_neurocore.utils.bitstreams import generate_bernoulli_bitstream, bitstream_to_probability
from sc_neurocore.utils.fault_injection import FaultInjector
error_rates = np.linspace(0, 0.3, 31)
sc_degradation = []
fp_degradation = []
for rate in error_rates:
sc_err, fp_err = [], []
for _ in range(200):
# SC
bs = generate_bernoulli_bitstream(0.5, 512)
c = FaultInjector.inject_bit_flips(bs, rate)
sc_err.append(abs(bitstream_to_probability(c) - 0.5))
# Fixed-point
val = 128 # 0.5 in Q8.8
for pos in range(16):
if np.random.random() < rate:
val ^= (1 << pos)
if val >= 32768:
val -= 65536
fp_err.append(abs(val / 256.0 - 0.5))
sc_degradation.append(np.mean(sc_err))
fp_degradation.append(np.mean(fp_err))
# SC degrades linearly, fixed-point degrades catastrophically
print(f"At 5%: SC={sc_degradation[5]:.4f} FP={fp_degradation[5]:.4f}")
print(f"At 15%: SC={sc_degradation[15]:.4f} FP={fp_degradation[15]:.4f}")
print(f"At 30%: SC={sc_degradation[30]:.4f} FP={fp_degradation[30]:.4f}")
3. Hardware-Aware Training (Memristive Defects)¶
Real memristive crossbar arrays have stuck-at faults. SC-NeuroCore's
HardwareAwareSCLayer masks stuck synapses during training — the network
learns to route information around defects.
from sc_neurocore.layers.hardware_aware import HardwareAwareSCLayer
# 10% of synapses are permanently stuck
layer = HardwareAwareSCLayer(
n_inputs=8, n_neurons=4, length=256,
stuck_rate=0.10,
seed=42,
)
# Forward pass — stuck synapses contribute nothing
out = layer.forward([0.5] * 8)
print(f"Output with 10% stuck synapses: {[f'{x:.3f}' for x in out]}")
print(f"Stuck synapse count: {layer.n_stuck} / {8*4}")
# Compare with a healthy layer
from sc_neurocore import VectorizedSCLayer
healthy = VectorizedSCLayer(n_inputs=8, n_neurons=4, length=256)
out_h = healthy.forward([0.5] * 8)
print(f"Healthy layer output: {[f'{x:.3f}' for x in out_h]}")
4. Adaptive Bitstream Length¶
Longer bitstreams give higher precision but take more cycles. Use analytical bounds to compute the minimum length for a target precision:
from sc_neurocore.utils.bitstreams import adaptive_length
# Hoeffding bound (distribution-free)
L_h = adaptive_length(p=0.5, epsilon=0.01, confidence=0.95, method="hoeffding")
print(f"Hoeffding: L={L_h} for 1% precision at 95% confidence")
# Chebyshev bound (uses variance)
L_c = adaptive_length(p=0.5, epsilon=0.01, confidence=0.95, method="chebyshev")
print(f"Chebyshev: L={L_c}")
# Variance-based (tightest, uses known p)
L_v = adaptive_length(p=0.5, epsilon=0.01, confidence=0.95, method="variance")
print(f"Variance: L={L_v}")
Precision-latency tradeoff¶
| Bitstream length L | Effective bits | Max error | Cycles |
|---|---|---|---|
| 64 | ~3 | ±0.125 | 64 |
| 256 | ~4 | ±0.063 | 256 |
| 1024 | ~5 | ±0.031 | 1024 |
| 4096 | ~6 | ±0.016 | 4096 |
Error scales as $O(1/\sqrt{L})$ — doubling precision requires 4x the bits.
5. Radiation Hardening¶
SC's fault tolerance makes it a candidate for space and nuclear environments where single-event upsets (SEUs) flip random bits. Combined with triple modular redundancy (TMR):
from sc_neurocore import VectorizedSCLayer
from sc_neurocore.utils.fault_injection import FaultInjector
import numpy as np
# TMR: three copies, majority vote
inputs = [0.3, 0.5, 0.7, 0.9]
outputs = []
for replica in range(3):
layer = VectorizedSCLayer(n_inputs=4, n_neurons=2, length=512)
out = layer.forward(inputs)
# Inject SEU faults
out_noisy = out + np.random.normal(0, 0.05, out.shape) # analog noise
outputs.append(out_noisy)
# Majority vote (median)
voted = np.median(outputs, axis=0)
print(f"TMR+SC output: {voted}")
Applications¶
| Domain | Why SC fault tolerance matters |
|---|---|
| Space | SEU tolerance without heavy shielding |
| Medical implants | Graceful degradation, no catastrophic failure |
| Memristive hardware | Works around stuck-at defects |
| Edge AI | Low-power + reliable on cheap hardware |
| Automotive | Functional safety (ISO 26262) without 10x redundancy |
Further Reading¶
- Tutorial 13: Fixed-Point Arithmetic — Q8.8 details
- Tutorial 26: Predictive Coding — another SC advantage
- Safety: FMEA — failure mode analysis