HPC and GPU Acceleration

SCPN-Fusion-Core supports high-performance computing through a Rust native backend, C++ FFI bridge, and a planned GPU acceleration path.

Rust Workspace

The scpn-fusion-rs/ directory contains a 10-crate Rust workspace that mirrors the Python package structure:

Crate

Purpose

fusion-types

Shared data types, configuration structs, error types

fusion-math

Linear algebra (SOR, GMRES, multigrid), FFT, interpolation, Chebyshev polynomials, elliptic integrals, tridiagonal solver

fusion-core

Grad-Shafranov kernel, transport, inverse reconstruction, stability, pedestal model, AMR

fusion-physics

MHD sawtooth, Hall-MHD, turbulence, FNO, heating, compact reactor optimiser, design scanner, sandpile

fusion-nuclear

Neutronics, divertor, wall interaction, PWI erosion, TEMHD, balance of plant

fusion-engineering

Blanket engineering, magnet design, tritium systems, plant layout

fusion-control

PID, MPC, SNN controller, SPI mitigation, disruption predictor, digital twin, analytic solver, SOC learning

fusion-diagnostics

Sensor models, tomography

fusion-ml

Neural equilibrium, neural transport, disruption classifier, polynomial chaos expansion (PCE) UQ

fusion-python

PyO3 bindings producing scpn_fusion_rs.pyd / .so

Build Configuration

The workspace is optimised for maximum performance:

# Cargo.toml [profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1

Key dependencies: ndarray, nalgebra, rayon (parallelism), rustfft, serde, pyo3 (Python bindings).

The Rust workspace has no external C or Fortran dependencies – it is pure Rust.

Python-Rust FFI

The fusion-python crate provides PyO3 bindings that expose the Rust solvers as a native Python extension module. The Python package auto-detects the extension at import time:

try:
    from ._rust_compat import FusionKernel, RUST_BACKEND
except ImportError:
    from .fusion_kernel import FusionKernel
    RUST_BACKEND = False

All API signatures are identical between the Python and Rust paths, ensuring zero code changes when switching backends.

C++ FFI Bridge

The hpc_bridge module (hpc/hpc_bridge.py) provides a C++ FFI bridge for interfacing with external HPC solvers:

  • solver.cpp – C++ solver implementation using types.h shared data structures

  • ctypes-based Python bindings for calling compiled C++ from Python

  • Shared-memory data exchange to avoid serialisation overhead

This bridge is primarily used for prototyping custom solver kernels before porting them to Rust.

GPU Acceleration Roadmap

GPU support is planned in three phases (tracked in docs/GPU_ACCELERATION_ROADMAP.md):

Phase 1: wgpu SOR kernel

Red-Black SOR stencil implemented as a wgpu compute shader, providing cross-platform GPU acceleration (Vulkan, Metal, D3D12, WebGPU) with deterministic CPU fallback.

Performance targets:

  • 65x65 grid: 2x–4x speedup

  • 257x257 grid: 5x–12x speedup

Phase 2: GPU-backed GMRES preconditioning

CUDA/ROCm adapters for the GMRES linear solver with CPU fallback for environments without GPU drivers.

Performance target: 2x–6x speedup on inverse solves.

Phase 3: Full multigrid on device

Smooth, restrict, prolong, and coupled nonlinear multigrid path running entirely on GPU.

Performance targets:

  • < 1 ms for control-loop grids

  • 10x–30x speedup for 257x257+ workloads

Acceptance gates for each phase:

  • Correctness: residual behaviour matches CPU reference within configured tolerance

  • Performance: measured speedups meet declared minimum floors

  • Operations: runtime capability detection + automatic CPU fallback

The gpu_runtime module (core/gpu_runtime.py) provides the GPURuntimeBridge class for managing GPU device detection, memory allocation, and kernel dispatch.

Benchmarking

Criterion micro-benchmarks are included in the Rust workspace:

cd scpn-fusion-rs
cargo bench

Available benchmarks:

  • sor_bench.rs – Red-Black SOR stencil at 65x65 and 128x128

  • inverse_bench.rs – Levenberg-Marquardt inverse reconstruction (FD vs analytical Jacobian comparison)

  • neural_transport_bench.rs – Neural transport MLP inference

Python-side profiling is available via:

python profiling/profile_kernel.py --top 50
python profiling/profile_geometry_3d.py --toroidal 48 --poloidal 48 --top 50

Results are written to artifacts/profiling/.

Performance Summary

Metric

Rust (release)

Python (NumPy)

Speedup

65x65 equilibrium

~100 ms

~5 s

~50x

128x128 equilibrium

~1 s

~30 s

~30x

SOR step (65x65)

microseconds

milliseconds

~100x

Neural transport MLP

~5 microseconds/point

~500 microseconds/point

~100x

Inverse reconstruction

~4 s (5 LM iters)

~60 s

~15x

Note

These are internal measurements on specific hardware. We encourage independent reproduction using cargo bench and benchmarks/collect_results.sh.