Skip to content

Tutorial 43: Cross-Platform Performance Profiling

Which platform should you run your SNN on? The profiler compares Python (NumPy), Rust engine, and multiple FPGA targets in one call, producing a ranked table of latency, throughput, power, and energy per inference.

The Decision Problem

You've trained an SNN. Now where do you deploy it?

Platform Latency Power Dev Time Best For
Python (NumPy) ~10 ms ~15 W (CPU) Minutes Prototyping
Rust engine ~0.1 ms ~15 W (CPU) Minutes Production server
FPGA (ice40) ~0.02 ms ~5 mW Hours Ultra-low power edge
FPGA (Artix-7) ~0.005 ms ~50 mW Hours High throughput

The profiler measures all of these for your specific network and produces a recommendation.

Compare All Platforms

Python
from sc_neurocore.profiler import compare
from sc_neurocore.profiler.platform_profiler import format_table

results = compare(
    layer_sizes=[(784, 128), (128, 10)],
    bitstream_length=256,
)
print(format_table(results))
# Platform        Latency     Throughput    Power     Energy/Inf
# ─────────────────────────────────────────────────────────────
# fpga_ice40      0.021 ms    47,619 inf/s  4.6 mW   98 nJ
# fpga_ecp5       0.021 ms    47,619 inf/s  5.8 mW   121 nJ
# rust            0.082 ms    12,195 inf/s  15.0 W    1,230,000 nJ
# fpga_artix7     0.021 ms    47,619 inf/s  7.2 mW   151 nJ
# python          12.9 ms     78 inf/s      15.0 W    193,500,000 nJ

The table is sorted by energy efficiency. FPGA wins by ~12,000× over Python and ~12× over Rust for energy per inference.

Specific Platforms

Compare only the platforms you're considering:

Python
results = compare(
    layer_sizes=[(64, 32), (32, 10)],
    platforms=["python", "rust", "fpga_artix7"],
    bitstream_length=128,
)
for r in results:
    print(f"{r.platform:15s}: {r.energy_per_inf_nj:>10.2f} nJ, "
          f"{r.latency_ms:>8.3f} ms, {r.throughput_per_s:>10,} inf/s")

Scaling Analysis

How does each platform scale with network size?

Python
import matplotlib
# (or just print the table)

for n_hidden in [32, 64, 128, 256, 512]:
    layers = [(784, n_hidden), (n_hidden, 10)]
    results = compare(layers, platforms=["rust", "fpga_ice40"])
    rust = next(r for r in results if r.platform == "rust")
    fpga = next(r for r in results if r.platform == "fpga_ice40")
    print(f"Hidden={n_hidden:>3d}: Rust={rust.latency_ms:.3f}ms, "
          f"FPGA={fpga.latency_ms:.3f}ms, "
          f"FPGA energy advantage={rust.energy_per_inf_nj / fpga.energy_per_inf_nj:.0f}×")

At small network sizes (hidden≤64), FPGA advantage is ~10,000×. At large sizes (hidden≥256), FPGA may not fit on iCE40 — switch to ECP5 or Artix-7.

Interpreting Results

Choose FPGA when: - Energy budget < 1 mW (battery, implant, sensor node) - Latency requirement < 100 µs (real-time control) - Network fits on target device

Choose Rust when: - Network is too large for FPGA - Need flexibility (change network without resynthesis) - Server deployment with high throughput

Choose Python when: - Prototyping and experimentation - Integration with other Python libraries - Training (FPGA can't train)

Integration with Studio

The Studio's Synthesis Dashboard shows FPGA resource usage. The energy estimate is available via the Estimate button (no Yosys needed). For a full platform comparison, run the profiler from Python.

References

  • Sze et al. (2017). "Efficient Processing of Deep Neural Networks: A Tutorial and Survey." Proceedings of the IEEE 105(12):2295-2329.
  • Horowitz (2014). "Computing's Energy Problem (and what we can do about it)." ISSCC 2014 Keynote.