Skip to content

MNIST Classification with Stochastic Computing

Classify handwritten digits using a two-layer SC spiking neural network. This tutorial demonstrates the complete pipeline: pixel encoding → bitstream multiplication → LIF integration → spike-rate readout.

Prerequisites: pip install sc-neurocore scikit-learn matplotlib

1. Load and preprocess MNIST

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

mnist = fetch_openml("mnist_784", version=1, as_frame=False, parser="liac-arff")
X = mnist.data.astype(np.float64) / 255.0  # normalise to [0, 1]
y = mnist.target.astype(int)

# Use a small subset for SC (bitstream simulation is slower than float)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=2000, test_size=500, random_state=42, stratify=y
)
print(f"Train: {X_train.shape}, Test: {X_test.shape}")

SC is compute-intensive (each scalar becomes an L-bit stream), so we use 2000 training / 500 test images. For production, use the Rust engine or VectorizedSCLayer with GPU.

2. Dimensionality reduction

784 inputs × L bits per input is expensive. Reduce to 50 features via PCA, preserving ~85% variance:

from sklearn.decomposition import PCA

pca = PCA(n_components=50, random_state=42)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# Rescale to [0, 1] for SC unipolar encoding
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_sc = scaler.fit_transform(X_train_pca)
X_test_sc = scaler.transform(X_test_pca).clip(0, 1)

print(f"Reduced features: {X_train_sc.shape[1]}, range [{X_train_sc.min():.2f}, {X_train_sc.max():.2f}]")

3. SC network architecture

Two-layer fully connected SNN:

50 inputs → [VectorizedSCLayer: 128 neurons] → [VectorizedSCLayer: 10 neurons] → argmax

The VectorizedSCLayer uses packed uint64 bitwise operations internally — orders of magnitude faster than loop-based SCDenseLayer.

from sc_neurocore import VectorizedSCLayer

BITSTREAM_LENGTH = 512  # precision vs speed tradeoff

layer1 = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=BITSTREAM_LENGTH)
layer2 = VectorizedSCLayer(n_inputs=128, n_neurons=10, length=BITSTREAM_LENGTH)

4. Forward pass

Each layer maps input probabilities → output firing rates (also in [0,1]). The final 10-neuron layer produces one rate per digit class.

def forward(x):
    """Single-sample forward pass returning 10 class scores."""
    h = layer1.forward(x)
    h = np.clip(h, 0.01, 0.99)  # avoid degenerate bitstreams at 0 or 1
    out = layer2.forward(h)
    return out

# Test on one sample
scores = forward(X_train_sc[0])
print(f"Scores shape: {scores.shape}")  # (10,)
print(f"Predicted: {scores.argmax()}, True: {y_train[0]}")

5. Training via rate-coded pseudo-gradient

SC networks lack a clean differentiable path through the stochastic encoding. We use a simple approach: treat the layer weights as continuous parameters and update them using the derivative of the expected (rate-level) computation, bypassing the bitstream noise.

def softmax(z):
    e = np.exp(z - z.max())
    return e / e.sum()

def cross_entropy_grad(scores, label):
    """Gradient of cross-entropy w.r.t. scores (before softmax)."""
    probs = softmax(scores)
    probs[label] -= 1.0
    return probs

LR = 0.005
EPOCHS = 15

for epoch in range(EPOCHS):
    # Shuffle training data
    perm = np.random.permutation(len(X_train_sc))
    correct = 0

    for idx in perm:
        x = X_train_sc[idx]
        label = y_train[idx]

        # Forward
        h1 = layer1.forward(x)
        h1_clip = np.clip(h1, 0.01, 0.99)
        scores = layer2.forward(h1_clip)

        pred = scores.argmax()
        if pred == label:
            correct += 1

        # Backward (rate-level pseudo-gradient)
        grad_out = cross_entropy_grad(scores, label)

        # Update layer2 weights: dL/dW2 ≈ grad_out ⊗ h1
        dW2 = np.outer(grad_out, h1_clip)
        layer2.weights -= LR * dW2
        layer2.weights = np.clip(layer2.weights, 0.01, 0.99)
        layer2._refresh_packed_weights()

        # Backprop through layer2 to get grad_h1
        grad_h1 = layer2.weights.T @ grad_out

        # Update layer1 weights: dL/dW1 ≈ grad_h1 ⊗ x
        dW1 = np.outer(grad_h1, x)
        layer1.weights -= LR * dW1
        layer1.weights = np.clip(layer1.weights, 0.01, 0.99)
        layer1._refresh_packed_weights()

    acc = correct / len(X_train_sc) * 100
    print(f"Epoch {epoch+1:2d}/{EPOCHS}  Train acc: {acc:.1f}%")

Weight updates happen in float, then _refresh_packed_weights() re-encodes the continuous weights as new bitstreams. This is the standard SC training pattern: float gradients + stochastic inference.

6. Evaluation

correct = 0
for i in range(len(X_test_sc)):
    scores = forward(X_test_sc[i])
    if scores.argmax() == y_test[i]:
        correct += 1

test_acc = correct / len(X_test_sc) * 100
print(f"Test accuracy: {test_acc:.1f}%")

Expected: ~85-90% with L=512, 50 PCA features, 15 epochs. This is competitive with float SNNs at the same architecture size.

7. Bitstream length vs accuracy

SC precision scales with bitstream length. Measure the tradeoff:

for L in [64, 128, 256, 512, 1024]:
    l1 = VectorizedSCLayer(n_inputs=50, n_neurons=128, length=L)
    l2 = VectorizedSCLayer(n_inputs=128, n_neurons=10, length=L)
    # Copy trained weights
    l1.weights = layer1.weights.copy()
    l2.weights = layer2.weights.copy()
    l1._refresh_packed_weights()
    l2._refresh_packed_weights()

    correct = 0
    for i in range(len(X_test_sc)):
        h = np.clip(l1.forward(X_test_sc[i]), 0.01, 0.99)
        if l2.forward(h).argmax() == y_test[i]:
            correct += 1

    print(f"L={L:5d}  Accuracy: {correct/len(X_test_sc)*100:.1f}%")

Expected trend:

L=   64  Accuracy: ~75%
L=  128  Accuracy: ~80%
L=  256  Accuracy: ~85%
L=  512  Accuracy: ~88%
L= 1024  Accuracy: ~89%

Diminishing returns past L=512 confirm the 1/√L precision scaling.

8. Confusion matrix

from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

preds = [forward(X_test_sc[i]).argmax() for i in range(len(X_test_sc))]
cm = confusion_matrix(y_test, preds)

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(cm, cmap="Blues")
ax.set_xlabel("Predicted")
ax.set_ylabel("True")
ax.set_title("SC-SNN MNIST Confusion Matrix")
for i in range(10):
    for j in range(10):
        ax.text(j, i, str(cm[i, j]), ha="center", va="center",
                color="white" if cm[i, j] > cm.max()/2 else "black", fontsize=8)
plt.colorbar(im)
plt.tight_layout()
plt.savefig("mnist_sc_confusion.png", dpi=150)
print("Saved mnist_sc_confusion.png")

What you learned

  • SC classification pipeline: PCA → unipolar encode → SC layers → argmax
  • VectorizedSCLayer with packed uint64 ops for practical training speed
  • Rate-level pseudo-gradient training with float weight updates
  • Bitstream length controls precision/speed tradeoff (1/√L scaling)
  • 50 PCA features + 128 hidden + L=512 achieves ~88% on MNIST

Next steps

  • Replace PCA with a convolutional SC front-end (SCConv2DLayer)
  • Use the Rust engine for 10-100x training speedup
  • Add STDP for unsupervised feature extraction (see StochasticSTDPSynapse)
  • Deploy the trained network on FPGA (see FPGA in 20 Minutes)