MNIST Classification with Stochastic Computing¶
Classify handwritten digits using a two-layer SC spiking neural network. This tutorial demonstrates the complete pipeline: pixel encoding → bitstream multiplication → LIF integration → spike-rate readout.
Prerequisites: pip install sc-neurocore scikit-learn matplotlib
1. Load and preprocess MNIST¶
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
mnist = fetch_openml("mnist_784", version=1, as_frame=False, parser="liac-arff")
X = mnist.data.astype(np.float64) / 255.0 # normalise to [0, 1]
y = mnist.target.astype(int)
# Use a small subset for SC (bitstream simulation is slower than float)
X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=2000, test_size=500, random_state=42, stratify=y
)
print(f"Train: {X_train.shape}, Test: {X_test.shape}")
SC is compute-intensive (each scalar becomes an L-bit stream), so we
use 2000 training / 500 test images. For production, use the Rust
engine or VectorizedSCLayer with GPU.
2. Dimensionality reduction¶
784 inputs × L bits per input is expensive. Reduce to 50 features via PCA, preserving ~85% variance:
from sklearn.decomposition import PCA
pca = PCA(n_components=50, random_state=42)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
# Rescale to [0, 1] for SC unipolar encoding
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_sc = scaler.fit_transform(X_train_pca)
X_test_sc = scaler.transform(X_test_pca).clip(0, 1)
print(f"Reduced features: {X_train_sc.shape[1]}, range [{X_train_sc.min():.2f}, {X_train_sc.max():.2f}]")
3. SC network architecture¶
Two-layer fully connected SNN:
50 inputs → [VectorizedSCLayer: 128 neurons] → [VectorizedSCLayer: 10 neurons] → argmax
The VectorizedSCLayer uses packed uint64 bitwise operations
internally — orders of magnitude faster than loop-based SCDenseLayer.
from sc_neurocore import VectorizedSCLayer
BITSTREAM_LENGTH = 512 # precision vs speed tradeoff
layer1 = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=BITSTREAM_LENGTH)
layer2 = VectorizedSCLayer(n_inputs=128, n_neurons=10, length=BITSTREAM_LENGTH)
4. Forward pass¶
Each layer maps input probabilities → output firing rates (also in [0,1]). The final 10-neuron layer produces one rate per digit class.
def forward(x):
"""Single-sample forward pass returning 10 class scores."""
h = layer1.forward(x)
h = np.clip(h, 0.01, 0.99) # avoid degenerate bitstreams at 0 or 1
out = layer2.forward(h)
return out
# Test on one sample
scores = forward(X_train_sc[0])
print(f"Scores shape: {scores.shape}") # (10,)
print(f"Predicted: {scores.argmax()}, True: {y_train[0]}")
5. Training via rate-coded pseudo-gradient¶
SC networks lack a clean differentiable path through the stochastic encoding. We use a simple approach: treat the layer weights as continuous parameters and update them using the derivative of the expected (rate-level) computation, bypassing the bitstream noise.
def softmax(z):
e = np.exp(z - z.max())
return e / e.sum()
def cross_entropy_grad(scores, label):
"""Gradient of cross-entropy w.r.t. scores (before softmax)."""
probs = softmax(scores)
probs[label] -= 1.0
return probs
LR = 0.005
EPOCHS = 15
for epoch in range(EPOCHS):
# Shuffle training data
perm = np.random.permutation(len(X_train_sc))
correct = 0
for idx in perm:
x = X_train_sc[idx]
label = y_train[idx]
# Forward
h1 = layer1.forward(x)
h1_clip = np.clip(h1, 0.01, 0.99)
scores = layer2.forward(h1_clip)
pred = scores.argmax()
if pred == label:
correct += 1
# Backward (rate-level pseudo-gradient)
grad_out = cross_entropy_grad(scores, label)
# Update layer2 weights: dL/dW2 ≈ grad_out ⊗ h1
dW2 = np.outer(grad_out, h1_clip)
layer2.weights -= LR * dW2
layer2.weights = np.clip(layer2.weights, 0.01, 0.99)
layer2._refresh_packed_weights()
# Backprop through layer2 to get grad_h1
grad_h1 = layer2.weights.T @ grad_out
# Update layer1 weights: dL/dW1 ≈ grad_h1 ⊗ x
dW1 = np.outer(grad_h1, x)
layer1.weights -= LR * dW1
layer1.weights = np.clip(layer1.weights, 0.01, 0.99)
layer1._refresh_packed_weights()
acc = correct / len(X_train_sc) * 100
print(f"Epoch {epoch+1:2d}/{EPOCHS} Train acc: {acc:.1f}%")
Weight updates happen in float, then _refresh_packed_weights()
re-encodes the continuous weights as new bitstreams. This is the
standard SC training pattern: float gradients + stochastic inference.
6. Evaluation¶
correct = 0
for i in range(len(X_test_sc)):
scores = forward(X_test_sc[i])
if scores.argmax() == y_test[i]:
correct += 1
test_acc = correct / len(X_test_sc) * 100
print(f"Test accuracy: {test_acc:.1f}%")
Expected: ~85-90% with L=512, 50 PCA features, 15 epochs. This is competitive with float SNNs at the same architecture size.
7. Bitstream length vs accuracy¶
SC precision scales with bitstream length. Measure the tradeoff:
for L in [64, 128, 256, 512, 1024]:
l1 = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=L)
l2 = VectorizedSCLayer(n_inputs=128, n_neurons=10, length=L)
# Copy trained weights
l1.weights = layer1.weights.copy()
l2.weights = layer2.weights.copy()
l1._refresh_packed_weights()
l2._refresh_packed_weights()
correct = 0
for i in range(len(X_test_sc)):
h = np.clip(l1.forward(X_test_sc[i]), 0.01, 0.99)
if l2.forward(h).argmax() == y_test[i]:
correct += 1
print(f"L={L:5d} Accuracy: {correct/len(X_test_sc)*100:.1f}%")
Expected trend:
L= 64 Accuracy: ~75%
L= 128 Accuracy: ~80%
L= 256 Accuracy: ~85%
L= 512 Accuracy: ~88%
L= 1024 Accuracy: ~89%
Diminishing returns past L=512 confirm the 1/√L precision scaling.
8. Confusion matrix¶
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
preds = [forward(X_test_sc[i]).argmax() for i in range(len(X_test_sc))]
cm = confusion_matrix(y_test, preds)
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(cm, cmap="Blues")
ax.set_xlabel("Predicted")
ax.set_ylabel("True")
ax.set_title("SC-SNN MNIST Confusion Matrix")
for i in range(10):
for j in range(10):
ax.text(j, i, str(cm[i, j]), ha="center", va="center",
color="white" if cm[i, j] > cm.max()/2 else "black", fontsize=8)
plt.colorbar(im)
plt.tight_layout()
plt.savefig("mnist_sc_confusion.png", dpi=150)
print("Saved mnist_sc_confusion.png")
What you learned¶
- SC classification pipeline: PCA → unipolar encode → SC layers → argmax
VectorizedSCLayerwith packed uint64 ops for practical training speed- Rate-level pseudo-gradient training with float weight updates
- Bitstream length controls precision/speed tradeoff (1/√L scaling)
- 50 PCA features + 128 hidden + L=512 achieves ~88% on MNIST
Next steps¶
- Replace PCA with a convolutional SC front-end (
SCConv2DLayer) - Use the Rust engine for 10-100x training speedup
- Add STDP for unsupervised feature extraction (see
StochasticSTDPSynapse) - Deploy the trained network on FPGA (see FPGA in 20 Minutes)