HPC and GPU Acceleration¶
SCPN-Fusion-Core supports high-performance computing through a Rust native backend, C++ FFI bridge, and a planned GPU acceleration path.
Rust Workspace¶
The scpn-fusion-rs/ directory contains a 10-crate Rust workspace
that mirrors the Python package structure:
Crate |
Purpose |
|---|---|
|
Shared data types, configuration structs, error types |
|
Linear algebra (SOR, GMRES, multigrid), FFT, interpolation, Chebyshev polynomials, elliptic integrals, tridiagonal solver |
|
Grad-Shafranov kernel, transport, inverse reconstruction, stability, pedestal model, AMR |
|
MHD sawtooth, Hall-MHD, turbulence, FNO, heating, compact reactor optimiser, design scanner, sandpile |
|
Neutronics, divertor, wall interaction, PWI erosion, TEMHD, balance of plant |
|
Blanket engineering, magnet design, tritium systems, plant layout |
|
PID, MPC, SNN controller, SPI mitigation, disruption predictor, digital twin, analytic solver, SOC learning |
|
Sensor models, tomography |
|
Neural equilibrium, neural transport, disruption classifier, polynomial chaos expansion (PCE) UQ |
|
PyO3 bindings producing |
Build Configuration¶
The workspace is optimised for maximum performance:
# Cargo.toml [profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
Key dependencies: ndarray, nalgebra, rayon (parallelism),
rustfft, serde, pyo3 (Python bindings).
The Rust workspace has no external C or Fortran dependencies – it is pure Rust.
Python-Rust FFI¶
The fusion-python crate provides PyO3 bindings that expose the Rust
solvers as a native Python extension module. The Python package
auto-detects the extension at import time:
try:
from ._rust_compat import FusionKernel, RUST_BACKEND
except ImportError:
from .fusion_kernel import FusionKernel
RUST_BACKEND = False
All API signatures are identical between the Python and Rust paths, ensuring zero code changes when switching backends.
C++ FFI Bridge¶
The hpc_bridge module (hpc/hpc_bridge.py) provides a C++ FFI
bridge for interfacing with external HPC solvers:
solver.cpp– C++ solver implementation usingtypes.hshared data structuresctypes-based Python bindings for calling compiled C++ from PythonShared-memory data exchange to avoid serialisation overhead
This bridge is primarily used for prototyping custom solver kernels before porting them to Rust.
GPU Acceleration Roadmap¶
GPU support is planned in three phases (tracked in
docs/GPU_ACCELERATION_ROADMAP.md):
- Phase 1: wgpu SOR kernel
Red-Black SOR stencil implemented as a
wgpucompute shader, providing cross-platform GPU acceleration (Vulkan, Metal, D3D12, WebGPU) with deterministic CPU fallback.Performance targets:
65x65 grid: 2x–4x speedup
257x257 grid: 5x–12x speedup
- Phase 2: GPU-backed GMRES preconditioning
CUDA/ROCm adapters for the GMRES linear solver with CPU fallback for environments without GPU drivers.
Performance target: 2x–6x speedup on inverse solves.
- Phase 3: Full multigrid on device
Smooth, restrict, prolong, and coupled nonlinear multigrid path running entirely on GPU.
Performance targets:
< 1 ms for control-loop grids
10x–30x speedup for 257x257+ workloads
Acceptance gates for each phase:
Correctness: residual behaviour matches CPU reference within configured tolerance
Performance: measured speedups meet declared minimum floors
Operations: runtime capability detection + automatic CPU fallback
The gpu_runtime module (core/gpu_runtime.py) provides the
GPURuntimeBridge class for managing GPU device detection, memory
allocation, and kernel dispatch.
Benchmarking¶
Criterion micro-benchmarks are included in the Rust workspace:
cd scpn-fusion-rs
cargo bench
Available benchmarks:
sor_bench.rs– Red-Black SOR stencil at 65x65 and 128x128inverse_bench.rs– Levenberg-Marquardt inverse reconstruction (FD vs analytical Jacobian comparison)neural_transport_bench.rs– Neural transport MLP inference
Python-side profiling is available via:
python profiling/profile_kernel.py --top 50
python profiling/profile_geometry_3d.py --toroidal 48 --poloidal 48 --top 50
Results are written to artifacts/profiling/.
Performance Summary¶
Metric |
Rust (release) |
Python (NumPy) |
Speedup |
|---|---|---|---|
65x65 equilibrium |
~100 ms |
~5 s |
~50x |
128x128 equilibrium |
~1 s |
~30 s |
~30x |
SOR step (65x65) |
microseconds |
milliseconds |
~100x |
Neural transport MLP |
~5 microseconds/point |
~500 microseconds/point |
~100x |
Inverse reconstruction |
~4 s (5 LM iters) |
~60 s |
~15x |
Note
These are internal measurements on specific hardware. We encourage
independent reproduction using cargo bench and
benchmarks/collect_results.sh.