Benchmarks¶

Current native runtime evidence package¶

How to read benchmark numbers responsibly¶

Every benchmark in this repository carries an evidence class:

local_regression: valid for comparison and engineering iteration, not a PCS deployment claim.
production_benchmark: requires the report metadata to justify host isolation, scheduler context, and explicit admission conditions.

The same command can therefore be technically correct but materially insufficient for deployment planning if the evidence class and environment metadata are not in the right channel for the audience. Use local-reported numbers to prioritize implementation paths; use production-admitted reports for procurement, safety, or facility discussions.

The v0.20.4 release candidate includes the following repository reports as local-regression evidence for the native execution and formal-runtime lane:

Report family	Purpose	Claim boundary
`native_handoff_comparison.*`	Compare Python orchestration with fused Rust/PyO3 execution at one campaign boundary	Local execution-ownership evidence; not target PCS timing
`native_formal_modes_.`	Compare disabled, `async_drop`, `sync_stride`, and `aot_certificate` formal modes	Shows coverage/drop/timing semantics; non-isolated timings remain local regression
`native_formal_aot_certificate_admission_.`	Persist digest-bound AOT certificate admission evidence	Hot-path certificate evidence; production use requires production benchmark context
`native_formal_spin_pacing_.` and `native_control_spin_pacing_.`	Exercise opt-in spin pacing under workstation constraints	Short timing experiments only; do not cite as deployment timing

The reports intentionally keep workstation limitations visible: workspace state, proof-sampling drops where applicable, certificate digests, and evidence-class metadata. Do not promote their timing numbers into market, release, or facility claims unless the matching JSON report records production benchmark context and the validator admits it.

This project has three benchmark tracks:

Python CLI micro-benchmark (scpn-control benchmark)
Rust Criterion benches (cargo bench --workspace)
Native handoff comparison (scripts/benchmark_native_handoff.py)

Native handoff comparison¶

Use this track after changes to the control-loop execution boundary:

src/scpn_control/core/rust_engine.py
src/scpn_control/cli.py
scpn-control-rs/crates/control-python

The benchmark forces both execution modes at the same campaign boundary:

python: Python orchestration with Rust-compatible controller and transport primitives.
native: fused PyO3 Rust loop with cumulative native cycle telemetry.

The native loop also owns runtime formal verification. Python supplies bounded Petri-net checking policy through NeuroCyberneticEngine.configure_native_formal_verification(...); the PyO3 crate either spawns the Z3 worker inside Rust or evaluates a compiled certificate monitor in the native loop. Worker-backed modes pin to core_z3 when host affinity is available and pass only fixed numeric snapshots over a bounded crossbeam-channel. No Z3 ASTs, solver contexts, or proof objects cross the Python boundary during the fused campaign loop.

Three formal-verification execution modes are benchmarkable:

async_drop: non-blocking proof sampling. The control loop never waits for Z3; saturated snapshots are counted as drops.
sync_stride: deterministic stride verification. The control loop blocks on each configured stride step until the Rust Z3 worker returns a proof result.
aot_certificate: deterministic compiled-certificate monitoring. The control loop evaluates the admitted Petri invariant directly and does not construct Z3 contexts or enqueue proof work in the hot path. The current certificate is a sound sufficient condition for the configured bounded contract and fails closed when the state needs full Z3 search to admit. Admission is bound to a canonical certificate-assumption payload covering schema version, certificate identifier, Petri topology, maximum marking, maximum depth, and contract semantics. Runtime telemetry exposes certificate_admitted, certificate_schema_version, certificate_id, certificate_contract, and certificate_assumption_sha256 so benchmark artifacts identify the exact admitted monitor used in the hot path.

Use scripts/benchmark_native_formal_modes.py to quantify the difference:

PYTHONPATH=src .venv/bin/python scripts/benchmark_native_formal_modes.py \
  --steps 5000 \
  --repeats 3 \
  --tick-interval-s 0 \
  --pacing-modes sleep \
  --strides 1,5,20,30 \
  --transports std,io-uring

The report includes generated, submitted, checked, dropped, failure counts, certificate-admission fields, and sync wait timing. A strict certification argument must use sync_stride as the ground-truth proof engine or aot_certificate with certificate_admitted=true and one stable certificate_assumption_sha256 across the relevant comparison cases. async_drop must be described as asynchronous proof sampling.

The native fused loop also exposes pacing modes:

sleep: default scheduler-yield pacing. This is safe for normal developer runs but measures host wake-up latency as part of wall time.
spin: opt-in busy-wait pacing. The Rust loop uses std::hint::spin_loop instead of sleep, holds the native execution thread on-core, and is intended only for short deterministic timing experiments on isolated cores. Spin pacing rejects tick intervals above 0.01 s to prevent accidental long-duration core burn.

Compare sleep and spin pacing on the AOT hot path with:

PYTHONPATH=src .venv/bin/python scripts/benchmark_native_formal_modes.py \
  --steps 5000 \
  --repeats 3 \
  --tick-interval-s 0.0001 \
  --formal-modes disabled,aot_certificate \
  --pacing-modes sleep,spin \
  --strides 1 \
  --transports std \
  --evidence-class local_regression

Admit the persisted AOT certificate evidence before using it in a release or safety-case argument:

python validation/validate_native_formal_certificate_evidence.py \
  validation/reports/native_formal_aot_certificate_admission_20260604T103219Z.json \
  --max-aot-p99-cycle-us 10.0

The validator rejects reports with malformed JSON, the wrong benchmark schema, missing benchmark context, invalid evidence-class metadata, missing AOT cases, unstable certificate digests, missing certificate admission, nonzero drops, nonzero formal failures, incomplete generated/submitted/checked coverage, or AOT p99 cycle latency above the configured threshold. Reports generated on a loaded workstation or without explicit CPU/core isolation must use evidence_class=local_regression and production_claim_allowed=false. evidence_class=production_benchmark requires explicit isolation metadata, a clean workspace, and a declared yes/no value for concurrent heavy jobs.

scpn-control validate runs the same native formal certificate gate by default and emits the result under native_formal_certificate. The release-evidence admission step requires this section to pass, requires at least one admitted AOT certificate case, and binds the report to the certificate-assumption digest, benchmark-report digest, benchmark evidence class, and production-claim boundary. Local regression reports must keep production_claim_allowed=false; production benchmark reports must set it explicitly and must carry no validator errors. Use --no-native-formal-certificate only for local diagnostics; release evidence and preflight admission must not skip it.

Run:

PYTHONPATH=src .venv/bin/python scripts/benchmark_native_handoff.py \
  --steps 5000 \
  --tick-interval-s 0.0001 \
  --transport-backend std \
  --json-out validation/reports/native_handoff_comparison.json \
  --markdown-out validation/reports/native_handoff_comparison.md

The JSON output is the machine-readable evidence artifact. The Markdown output is the review table. A valid native run must report zero drops and zero publish failures. For formal-runtime evidence, inspect native.formal_verification in the returned campaign summary. The expected backend is rust-z3, and any nonzero failures count means the fused loop tripped the fail-closed formal contract instead of continuing under Python control-plane intervention. For AOT certificate runs, the expected backend is compiled-certificate; strict release evidence must include the schema, certificate identifier, contract label, and full SHA-256 assumption digest.

This benchmark isolates execution ownership. Use the transport-specific Rust benchmark and UDP fault-tolerance benchmark for std versus io-uring transport measurements.

Control-loop latency (canonical, reproducible)¶

All latency figures below are regenerated by committed benchmarks and published side by side for the fixed CI runner and the local workstation, each with full provenance. Reproduce with:

python scripts/benchmark_native_handoff.py    # integrated active control cycle
python benchmarks/controller_latency.py        # per-controller, per-backend

Artefacts: validation/reports/native_handoff_comparison.json (+ .local.json) and validation/reports/controller_latency.json (+ .local.json). GitHub-hosted runner hardware varies between runs, so the recorded cpu_model is part of the provenance; the CI column below is the AMD EPYC 7763 runner of CI run 27916153302, the local column is an Intel i5-11600K.

Integrated active control cycle (SNN + MPC handoff + formal safety check + UDP transport; 5000 steps × 7 repeats, P50/P99 µs):

Path	CI (EPYC 7763)	Local (i5-11600K)
Python-orchestrated	9.36 / 10.45	4.36 / 4.73
Native (Rust)	5.62 / 6.11	2.85 / 3.41

Per-controller step latency (isolated step, every available backend; P50/P99 µs). Backends whose dependency/binding is absent are listed so nothing is omitted:

Controller	Backend	CI (EPYC 7763)	Local (i5-11600K)
PID	numpy	0.42 / 0.52	0.25 / 0.56
PID	rust	unavailable (PyPIDController not built)	unavailable
SNN	numpy	23.90 / 43.14	13.29 / 26.56
SNN	rust	0.88 / 0.99	0.56 / 1.08
MPC (Np=10)	acados	10679 / 11393	6128 / 10417
MPC (Np=10)	osqp	16846 / 18090	9202 / 15622
MPC (Np=10)	internal QP	18847 / 25172	10460 / 25058
MPC (Np=10)	scipy	23302 / 24416	12750 / 21701
MPC (Np=10)	casadi	78887 / 82806	54573 / 88107
H-infinity	numpy (general state-space)	29.43 / 53.42	13.27 / 29.81
H-infinity	rust (2-state VDE)	1.39 / 1.49	1.05 / 5.28

All five MPC QP backends are now measured (acados is built in the benchmark-nightly workflow and locally). Every backend's real-time-iteration tick is millisecond-scale, with acados the fastest (≈10.7 ms CI / 6.1 ms local). The microsecond MPC regime is not reachable through the Python step_rti path with any QP backend: per-stage set/get marshalling across the Python↔C boundary dominates the tick. A microsecond MPC tick would require a non-Python (C/Rust) control loop, not merely a compiled QP solver.

Capacitor-bank energy ledger¶

Use this track after changes to the CONTROL-owned capacitor-bank RLC admission surface:

src/scpn_control/control/capacitor_bank_state.py
scpn-control-rs/crates/control-control/src/capacitor_bank.rs
scpn-control-rs/crates/control-python/src/lib.rs
benchmarks/bench_capacitor_bank_energy.py
scpn-control-rs/crates/control-control/examples/bench_capacitor_bank_energy.rs

The benchmark measures one discharge report per sample. Each report includes the total RLC energy ledger, residual, relative residual, and pass/fail flag. Python and Rust commands use the same capacitance, inductance, resistance, initial voltage, initial current, waveform, step size, and discharge length.

PYTHONPATH=src python benchmarks/bench_capacitor_bank_energy.py \
  --steps 500 \
  --warmup 50 \
  --discharge-steps 200 \
  --dt-s 1.0e-7 \
  --json-out validation/reports/capacitor_bank_energy_python.json \
  --markdown-out validation/reports/capacitor_bank_energy_python.md

cargo run --release --manifest-path scpn-control-rs/Cargo.toml \
  -p control-control --example bench_capacitor_bank_energy -- \
  --steps 500 \
  --warmup 50 \
  --discharge-steps 200 \
  --dt-s 1.0e-7 \
  --json-out validation/reports/capacitor_bank_energy_rust.json \
  --markdown-out validation/reports/capacitor_bank_energy_rust.md

The JSON artifacts are the machine-readable evidence. Markdown reports are for review. Runs without hard CPU isolation must retain evidence_class=local_regression and production_claim_allowed=false.

JAX GK parity evidence¶

validation/benchmark_jax_gk_parity.py persists schema-versioned parity artifacts for the JAX linear gyrokinetic backend against the repository native local-dispersion solver. Each artifact records backend, device kind, platform, JAX/JAXLIB versions, dtype, X64 state, solver kwargs, growth-rate and real-frequency tolerances, case-parameter metadata, mode-spectrum agreement, and canonical SHA-256 digests for solver kwargs, case parameters, and the complete payload. The default benchmark emits the built-in CBC, kinetic-electron TEM, and low-drive stable-mode parity cases.

Run:

python validation/benchmark_jax_gk_parity.py --json-out
JAX_PLATFORM_NAME=cpu python validation/benchmark_jax_gk_parity.py --json-out

Strict admission:

python validation/validate_jax_gk_parity.py \
  --artifact-root validation/reports/jax_gk_parity \
  --require-parity-artifacts \
  --require-cases cyclone_base_case,tem_kinetic_electron,stable_mode \
  --require-backends cpu,gpu

The benchmark command also writes aggregate timing evidence outside the artifact directory so strict admission does not accidentally ingest benchmark summaries as parity artifacts:

validation/reports/jax_gk_parity_benchmark.json
validation/reports/jax_gk_parity_benchmark.md

Current local CPU run, generated with JAX_PLATFORM_NAME=cpu, regenerated the three CPU artifacts in 2.963800 seconds total. Per-case timings were:

Case	Backend	Device	Elapsed s
`cyclone_base_case`	`cpu`	`cpu`	2.731885
`tem_kinetic_electron`	`cpu`	`cpu`	0.106864
`stable_mode`	`cpu`	`cpu`	0.096412

The persisted campaign currently contains three CPU and three GPU parity artefacts over CBC, kinetic-electron TEM, and low-drive stable-mode cases. The strict CPU/GPU admission gate reports complete required case/backend coverage, maximum gamma relative error 1.5386142994101046e-06, maximum omega absolute error 2.9658060068937786e-07, and entries payload SHA-256 7c7d3c7eefd5d2577579d1fd89d1fdaa056eebc13aa9d7f06f14cb1e8e755dfb. The claim boundary is backend parity only. These artifacts do not replace external TGLF, GENE, GS2, CGYRO, or QuaLiKiz validation for quantitative gyrokinetic claims.

Python CLI benchmark¶

Run:

python -m pip install -e .
scpn-control benchmark --n-bench 5000

JSON output:

scpn-control benchmark --n-bench 5000 --json-out

Current outputs include:

pid_us_per_step
snn_us_per_step
speedup_ratio

Runtime admission benchmark¶

Run this benchmark after changes to scpn_control.core.runtime_admission, NeuroCyberneticEngine.execute_hardware_loop(...), run-hardware-campaign, or the PyO3 runtime_admission_snapshot() counterpart:

taskset -c 4,5,6,7 env PYTHONPATH=src python benchmarks/bench_runtime_admission.py \
  --iterations 500 \
  --warmup 50 \
  --core-snn 4 \
  --core-z3 5 \
  --core-net 6 \
  --core-hb 7 \
  --json-out validation/reports/runtime_admission_release_20260605T000000Z.json \
  --md-out validation/reports/runtime_admission_release_20260605T000000Z.md

This measures launch-time admission overhead only. It is not a control-loop hot-path benchmark and does not qualify hard real-time PCS timing by itself. A production timing claim still requires --runtime-admission-policy require, PREEMPT_RT or realtime sysfs evidence, SCHED_FIFO/SCHED_RR execution, requested cores inside the process affinity mask, performance CPU governors, adequate memory-lock limits, heartbeat configuration, and hard-isolated benchmark context.

Current local regression evidence:

Evidence	Samples	Warmup	Median	p95	p99	Admission result
`validation/reports/runtime_admission_release_20260605T000000Z.md`	500	50	140.395 us	182.384 us	216.253 us	failed strict production admission: no PREEMPT_RT, no SCHED_FIFO/SCHED_RR, non-performance governors

Pulsed-shot MPC adapter regression¶

Use this benchmark after changes to the pulsed MPC admission boundary:

src/scpn_control/control/fusion_neural_mpc.py
src/scpn_control/control/pulsed_scenario_scheduler_v2.py
src/scpn_control/control/capacitor_bank_state.py
scpn-control-rs/crates/control-control/src/mpc.rs
scpn-control-rs/crates/control-python/src/lib.rs

Run the local regression harness with explicit output paths:

PYTHONPATH=src python benchmarks/bench_pulsed_mpc_adapter.py \
  --steps 2000 \
  --warmup 200 \
  --json-out validation/reports/pulsed_mpc_adapter_local_regression.json \
  --md-out validation/reports/pulsed_mpc_adapter_local_regression.md

If the optional PyO3 extension was rebuilt for the current Rust source, the Python report includes pyo3_non_burn_mask and pyo3_burn_infeasible_safe rows. On this workstation, build the editable PyO3 extension with a target directory on /tmp; the repository checkout is on a fuseblk volume, and maturin's rpath patching path can fail against generated shared objects in the repository target directory.

cd scpn-control-rs/crates/control-python
../../../.venv/bin/python -m maturin develop \
  --release \
  --features io-uring \
  --target-dir /tmp/scpn_control_rs_maturin_target

If patchelf is available on PATH and maturin reports an ELF parse error for libscpn_control_rs.so, remove that optional Python package from the virtual environment and rerun the command above. The extension does not require committing generated shared objects.

For soft core affinity on a developer workstation:

taskset -c 4,5 env PYTHONPATH=src python benchmarks/bench_pulsed_mpc_adapter.py \
  --steps 2000 \
  --warmup 200 \
  --evidence-class local_regression \
  --json-out validation/reports/pulsed_mpc_adapter_soft_isolated.json \
  --md-out validation/reports/pulsed_mpc_adapter_soft_isolated.md

The v1.1 report records Python adapter timing for non-burn masking, feasible burn admission, and infeasible-bank safe-action replacement. Each case also preserves the latest scpn-control.pulsed-mpc-decision-evidence.v1 admission digest, action digest, safe-action digest, and burn-mask digest. If the optional PyO3 extension is installed and rebuilt with PyMpcController.plan_pulsed(), the same report records Rust/PyO3 adapter timing and evidence fields. Reports generated on a loaded workstation or with soft affinity only must keep production_claim_allowed=false; they are local regression evidence, not target-hardware timing evidence.

Run the native Rust adapter benchmark when Rust control-surface timing changed or when the PyO3 extension is unavailable:

cargo run --manifest-path scpn-control-rs/Cargo.toml \
  -p control-control \
  --example bench_pulsed_mpc_adapter \
  --release \
  -- \
  --steps 2000 \
  --warmup 200 \
  --json-out validation/reports/pulsed_mpc_adapter_rust_local_regression.json \
  --md-out validation/reports/pulsed_mpc_adapter_rust_local_regression.md

This example times the Rust MPController.plan_pulsed() surface directly and writes a separate digest-bound JSON/Markdown report whose case payloads include the same pulsed-MPC decision evidence fields. Use the Python and Rust reports together as polyglot regression evidence.

Current PyO3-inclusive local regression evidence:

Evidence	Cases	Median range	p99 range	Claim boundary
`validation/reports/pulsed_mpc_adapter_pyo3_decision_evidence_python_20260604T171015Z.md`	Python + PyO3	40.112-873.7225 us	57.746-1264.718 us	local regression only
`validation/reports/pulsed_mpc_adapter_pyo3_decision_evidence_rust_20260604T171015Z.md`	native Rust	34.826-36.843 us	46.76-49.0 us	local regression only

Multi-shot campaign regression¶

Use this benchmark after changes to the multi-shot orchestration boundary:

src/scpn_control/control/multi_shot_campaign.py
src/scpn_control/control/pulsed_scenario_scheduler_v2.py
src/scpn_control/control/capacitor_bank_state.py
scpn-control-rs/crates/control-control/src/multi_shot_campaign.rs
scpn-control-rs/crates/control-python/src/lib.rs

The current harnesses exercise two complete shots with per-shot pulsed_mpc_admission_digest evidence so Python, Rust, and PyO3 surfaces are compared against the same digest-bound replay contract.

Python:

taskset -c 4,5 env PYTHONPATH=src python benchmarks/bench_multi_shot_campaign.py \
  --steps 2000 \
  --warmup 200 \
  --evidence-class local_regression \
  --json-out validation/reports/multi_shot_campaign_soft_isolated.json \
  --md-out validation/reports/multi_shot_campaign_soft_isolated.md

Rust:

taskset -c 4,5 cargo run --manifest-path scpn-control-rs/Cargo.toml \
  -p control-control \
  --example bench_multi_shot_campaign \
  --release \
  -- \
  --steps 2000 \
  --warmup 200 \
  --json-out validation/reports/multi_shot_campaign_rust_soft_isolated.json \
  --md-out validation/reports/multi_shot_campaign_rust_soft_isolated.md

These reports compare the Python campaign adapter, native Rust campaign kernel, and PyO3 table bridge evidence contract. Loaded workstation reports and soft-affinity reports are local regression evidence only.

Kuramoto Phase Sync — Python vs Rust Speedup¶

Single kuramoto_sakaguchi_step() with ζ=0.5, Ψ=0.3. Python: NumPy vectorised (AMD Ryzen, single-thread). Rust: Rayon par_chunks_mut(64) + criterion harness.

N	Python (ms)	Rust (ms)	Speedup
64	0.050	0.003	17.3×
256	0.029	0.033	0.9×
1 000	0.087	0.062	1.4×
4 096	0.328	0.180	1.8×
16 384	1.240	0.544	2.3×
65 536	5.010	1.980	2.5×

N=64: Rust wins on per-element throughput (no NumPy dispatch overhead). N=256: parity — NumPy SIMD matches rayon at this size. N≥1000: Rust rayon parallelism scales; sub-ms for N=16k (0.544 ms).

The Rust Criterion harness also includes the phase-lagged Sakaguchi case sakaguchi_alpha/alpha_0.37_zeta_0.5 for N=1000, 4096, 16 384, and 65 536. This keeps the alpha != 0 production path under the same regression benchmark surface as the baseline and global-driver kernels.

Knm 16-Layer UPDE PAC Benchmark¶

Full 16-layer outer loop (16 × 256 oscillators, Paper 27 Knm, ζ=0.5). Criterion harness, AMD Ryzen.

Config	Median (µs)	95% CI
PAC γ=1.0	909	[860, 921]
No PAC γ=0	811	[807, 827]

PAC gate overhead: ~12% (98 µs per step). See docs/bench_pac_vs_nopac.vl.json for Vega-Lite breakdown.

Lyapunov Exponent vs ζ Strength¶

N=1000, 200 steps @ dt=1ms, Ψ=0.3 (exogenous driver).

ζ	λ (K=0)	λ (K=2)
0.0	+0.01	+0.04
0.1	−0.03	−0.02
0.5	−0.23	−0.24
1.0	−0.49	−0.53
3.0	−1.65	−1.83
5.0	−3.01	−3.35

λ < 0 ⟹ stable convergence toward Ψ. See docs/bench_lyapunov_vs_zeta.vl.json for Vega-Lite plot.

Benchmark source: benches/bench_fusion_snn_hook.py (Python, pytest-benchmark).

Interactive Visualization¶

All three benchmark datasets (speedup, λ-vs-ζ, PAC latency) in a single interactive Vega-Lite chart with legend-click filtering:

docs/bench_interactive.vl.json

Open in the Vega Editor or embed via <vega-embed> / vegaEmbed(). Click legend entries to isolate series.

Gyrokinetic Linear Benchmark (v0.17.0)¶

The native linear GK eigenvalue solver is benchmarked via validation/benchmark_gk_linear.py:

Case	Parameters	gamma_max	Dominant	Runtime
Cyclone Base Case	R/a=2.78, q=1.4, s_hat=0.78, R/L_Ti=6.9	>0	ITG	~2s (12 k_y, n_theta=32)
SPARC mid-radius	R0=1.85, B0=12.2, q=1.8	finite	—	~1s (6 k_y)
ITER mid-radius	R0=6.2, B0=5.3, q=1.5	finite	—	~1s (6 k_y)

Multi-code comparison (benchmark_gk_linear.run_multi_code_comparison()):

Model	gamma_max	chi_i	chi_e
Native GK eigenvalue	from solver	from quasilinear	from quasilinear
Quasilinear dispersion	from analytic	from mixing-length	from mixing-length

Hybrid accuracy (validation/benchmark_hybrid_accuracy.py) measures the correction layer convergence over 20 transport steps with periodic GK spot-checks.

Nonlinear Cyclone Base Case Evidence¶

validation/gk_nonlinear_cyclone.py publishes schema-versioned nonlinear CBC diagnostic and saturation-admission evidence. The report separates quick diagnostic checks from saturated chi_i admission, binds the payload with a canonical SHA-256 digest, and writes both JSON and Markdown summaries:

validation/reports/gk_nonlinear_cyclone.json
validation/reports/gk_nonlinear_cyclone.md

The current local benchmark passed the linear recovery, energy-conservation, and zonal-flow diagnostics. The saturated nonlinear CBC claim remains blocked: the V4 run used 200 steps, produced chi_i_gB=1.6568813509166032e-09, failed the 1.0..5.0 CBC reference band, and had tail relative drift 0.30041712853638713 above the configured 0.10 threshold. Use --require-saturation for publication or release gates that must fail unless a long enough saturated campaign is admitted.

RZIP Calibration Benchmark¶

validation/benchmark_rzip_calibration.py publishes bounded local regression evidence for the RZIP rigid-plasma vertical-stability plant. The generated report records the declared vertical inertia, wall time constant, growth rate, growth time, tamper-evident evidence payload SHA-256 digest, and explicit facility-claim boundary.

Report artefacts:

validation/reports/rzip_calibration.json
validation/reports/rzip_calibration.md

Facility vertical-control claims still require documented public, external-code, or measured-discharge RZIP reference evidence that passes the strict admission gate.

RWM Claim-Admission Benchmark¶

validation/benchmark_rwm_claims.py publishes bounded local regression evidence for the resistive-wall-mode feedback model. The generated report records beta limits, wall-gap correction, rotation, sensor/coil topology, controller latency, coil coupling, open-loop growth, closed-loop growth, and the explicit facility-claim boundary.

Report artefacts:

validation/reports/rwm_claims.json
validation/reports/rwm_claims.md

Facility RWM-control claims still require documented public, external MHD, or measured-shot evidence that passes the strict admission gate.

Free-boundary Tracking Claim-Admission Benchmark¶

validation/benchmark_free_boundary_tracking_claims.py publishes bounded repository-regression evidence for the direct free-boundary tracking claim boundary. The generated report records true objective residuals, response-rank health, actuator bounds, latency-compensation status, supervisor actions, and the explicit facility-claim boundary.

Report artefacts:

validation/reports/free_boundary_tracking_claims.json
validation/reports/free_boundary_tracking_claims.md

Facility free-boundary tracking claims still require documented public, measured-replay, or external equilibrium benchmark evidence that passes the strict admission gate.

EFIT-lite Claim-Admission Benchmark¶

validation/benchmark_efit_lite_claims.py publishes bounded synthetic regression evidence for the fixed-boundary EFIT-lite reconstruction path. The generated report records diagnostic provenance, grid shape, flux-loop and B-probe counts, Rogowski radius, reconstructed current, q95, beta_pol, li, and the explicit facility-claim boundary.

Report artefacts:

validation/reports/efit_lite_claims.json
validation/reports/efit_lite_claims.md

Facility equilibrium claims still require matched EFIT/P-EFIT, documented public, or measured-discharge evidence for psi, Ip, q95, beta_pol, and li that passes the strict admission gate.

Kinetic EFIT Claim-Admission Benchmark¶

validation/benchmark_kinetic_efit_claims.py publishes bounded synthetic regression evidence for kinetic pressure, q-profile, anisotropy, diagnostic provenance, profile provenance, fast-ion provenance, MSE calibration, and normalised elliptic-rho interpolation geometry.

Report artefacts:

validation/reports/kinetic_efit_claims.json
validation/reports/kinetic_efit_claims.md

Facility kinetic-EFIT claims still require matched EFIT/P-EFIT, documented public, or measured-discharge references for pressure, q-profile, and anisotropy that pass the strict admission gate.

Differentiable Transport Gradient-Latency Benchmark¶

The controller-tuning facade measures the audited admission path for JAX transport gradients via validation/benchmark_differentiable_transport_latency.py. The timed path includes gradients for transport coefficients and source schedules plus the sampled independent finite-difference audit used before controller-tuning admission. The same benchmark script also writes a separate multi-step source-rollout latency report. That path measures the JAX rollout source-gradient plus sampled NumPy finite-difference audit used before NMPC source-rollout admission.

Report artefacts:

validation/reports/differentiable_transport_latency.json
validation/reports/differentiable_transport_latency.md
validation/reports/differentiable_transport_rollout_latency.json
validation/reports/differentiable_transport_rollout_latency.md
validation/reports/differentiable_transport_full_fidelity_readiness.json
validation/reports/differentiable_transport_full_fidelity_readiness.md

Admission:

python validation/validate_differentiable_transport_latency.py --require-admitted --json-out

The report is local latency evidence for the audited gradient-admission path. It is not a real-time control-loop guarantee and does not replace external transport validation. Full-fidelity differentiable-transport promotion must also pass transport_full_fidelity_readiness_evidence() with bound one-step and rollout reports, controller proof digest, equilibrium-coupled campaign metadata, and an admitted external reference artefact.

Differentiable Scenario Readiness Evidence¶

The coupled differentiable scenario facade records bounded evidence for an analytic Solov'ev-form flux surface coupled to the four-channel differentiable transport rollout. The report audits gradients with respect to both equilibrium parameters and source schedules, records local non-isolated timing context, and keeps the claim blocked until the physics-traceability gate is satisfied.

Report artefacts:

validation/reports/differentiable_scenario_readiness.json
validation/reports/differentiable_scenario_readiness.md

Admission:

python validation/benchmark_differentiable_scenario.py
python validation/validate_differentiable_scenario.py --json-out

The timing field is local admission evidence only. It is not an isolated hardware benchmark, real-time guarantee, external integrated-modelling result, or facility-control claim.

TORAX Code-to-Code External-Reference Evidence¶

validation/code_to_code_benchmark.py runs the local transport stack on a declared ITER-like scenario and can optionally execute TORAX on the same scenario. The script now emits schema-versioned JSON and Markdown evidence with a canonical payload digest, scenario digest, external-reference status, blocked reasons, and finite comparison metrics when TORAX is available.

Report artefacts:

validation/reports/code_to_code_benchmark.json
validation/reports/code_to_code_benchmark.md

Admission commands:

python validation/code_to_code_benchmark.py --with-torax
python validation/code_to_code_benchmark.py --with-torax --require-external

--require-external exits non-zero unless TORAX actually runs and the report contains finite scpn-control and TORAX profile/comparison payloads. Reports without TORAX remain explicit blocked evidence and do not satisfy full-fidelity external-reference requirements. The current local evidence run executed the scpn-control scenario path with average Te=8.142 keV, average Ti=8.109 keV, energy-balance error 1.3548e-02, particle-balance error 7.4357e-03, and blocked TORAX admission because TORAX is not installed in this environment.

End-to-End Control Latency Evidence¶

benchmarks/e2e_control_latency.py records the full sensor, equilibrium, transport, controller, and actuator-clamp path. Use --output-json when publishing evidence, and always supply --target-hardware-id, --target-hardware-class, and --rt-kernel for Raspberry Pi, Jetson, industrial PC, or other qualified target-hardware runs. Reports without those operator-qualified fields remain local latency evidence only and do not support hardware-in-the-loop or sub-millisecond real-time claims.

Persisted reports use the scpn-control.e2e-latency.v1 schema and include a canonical payload_sha256 over the latency payload. Each persisted report must record the benchmark command, generated UTC timestamp, inherited CPU affinity, isolation method, host load before and after the run, CPU governor state when available, and whether concurrent heavy jobs were observed. The admission validator rejects digest tampering, non-positive iteration counts, unordered percentiles, non-finite timing values, mismatched E2E/kernel overhead factors, missing benchmark context, unqualified hardware metadata, and reports that alter the local-evidence claim boundary.

Before a report is cited as target-hardware evidence, run:

python validation/validate_e2e_latency_evidence.py validation/reports/e2e_control_latency.json \
  --max-e2e-p95-us 1000 --json-out

The validator rejects unqualified local-host metadata, missing RT-kernel evidence, non-finite percentile data, missing claim-boundary text, and optional P95 latency threshold regressions.

VMEC-lite Claim-Admission Benchmark¶

validation/benchmark_vmec_lite_claims.py publishes bounded synthetic regression evidence for the fixed-boundary VMEC-lite spectral facade. The generated report records Fourier truncation, field periods, pressure and rotational-transform profile provenance, current-assumption provenance, positive sampled major-radius bounds, force residual, q-domain, the local non-isolated benchmark context, and the sealed VMEC-lite geometry-validation payload SHA-256.

Report artefacts:

validation/reports/vmec_lite_geometry.json
validation/reports/vmec_lite_geometry.md
validation/reports/vmec_lite_claims.json
validation/reports/vmec_lite_claims.md

Full VMEC or 3D MHD equilibrium claims still require matched VMEC, documented public, external-MHD, or measured-stellarator references for R_mn, Z_mn, rotational transform, convergence, and residual tolerance.

Neural-equilibrium Claim-Admission Benchmark¶

validation/benchmark_neural_equilibrium_pretraining.py publishes bounded synthetic pretraining evidence for the neural-equilibrium surrogate and records claim-admission evidence around the generated weights. The generated report captures sample count, grid shape, PCA component count, explained variance, synthetic MSE, Grad-Shafranov residual, weight checksum, and the explicit predictive-claim boundary.

Generated artefacts:

validation/reports/neural_equilibrium_pretraining.json
validation/reports/neural_equilibrium_pretraining.md
validation/reports/neural_equilibrium_synthetic_pretrain.npz

Facility predictive claims remain blocked until a strict P-EFIT or documented public reference artefact validates the same weight checksum and declares psi, pressure, q-profile, boundary, and magnetic-axis errors inside stated tolerances.

MAST EFM full-output baseline training is prepared through validation/train_mast_efm_neural_equilibrium.py. The checked-in dry-run launch report records the expected supervised-dataset SHA-256, current workstation payload visibility, storage-host storage-only execution policy, and fail-closed pre-run admission status. The companion result-template report binds the launch digest and declares the holdout, latency, GPU-cost, and admission-certificate outputs that a future workstation or cloud execution must publish before strict predictive admission is requested.

Neural-transport Claim-Admission Benchmark¶

validation/benchmark_neural_transport_claims.py publishes bounded local regression evidence for the neural-transport claim boundary. The generated report records the deterministic analytic-fallback benchmark cases, local channel agreement, local diffusivity errors, feature-schema contract, and the explicit quantitative-claim admission status.

Generated artefacts:

validation/reports/neural_transport_claims.json
validation/reports/neural_transport_claims.md

Quantitative QuaLiKiz, QLKNN, or documented-reference neural-transport claims remain blocked until a strict reference artefact validates the same neural weight checksum and declares chi_i, chi_e, D_e, and unstable-branch metrics inside stated tolerances.

Neural-turbulence Claim-Admission Benchmark¶

validation/benchmark_neural_turbulence_claims.py publishes bounded local regression evidence for the neural-turbulence claim boundary. The generated report records the deterministic analytic-target sample count, gyro-Bohm Q_i/Q_e/Gamma_e errors, critical-gradient activity agreement, feature-schema contract, and explicit quantitative-claim admission status.

Generated artefacts:

validation/reports/neural_turbulence_claims.json
validation/reports/neural_turbulence_claims.md

Quantitative gyrokinetic, QuaLiKiz, or documented-reference turbulence claims remain blocked until a strict reference artefact validates the same neural weight checksum and declares Q_i, Q_e, Gamma_e, flux-relative error, and critical-gradient metrics inside stated tolerances.

Orbit-following Claim-Admission Benchmark¶

validation/benchmark_orbit_following_claims.py publishes bounded synthetic regression evidence for guiding-centre orbit-following claim admission. The generated report records geometry provenance, particle provenance, collision-model provenance, loss-boundary provenance, banana width, first-orbit loss, and ensemble classification counts.

Report artefacts:

validation/reports/orbit_following_claims.json
validation/reports/orbit_following_claims.md

External orbit-following claims still require matched external-code, documented-public, published-benchmark, or measured fast-ion diagnostic references for banana width and loss fraction.

UQ Claim-Admission Benchmark¶

validation/benchmark_uq_claims.py publishes bounded synthetic regression evidence for full-chain uncertainty quantification claim admission. The generated report records scenario provenance, prior provenance, propagation chain, seed, sample count, ordered percentile checks, finite outputs, D-T fuel dilution, and density/temperature sensitivity provenance.

Report artefacts:

validation/reports/uq_claims.json
validation/reports/uq_claims.md

Calibrated predictive-UQ claims still require matched measured scenario, documented-public, external-UQ, or facility validation references for central values and sigma statistics.

Density-control Claim-Admission Benchmark¶

validation/benchmark_density_control_claims.py publishes bounded synthetic regression evidence for density-control claim admission. The generated report records geometry provenance, transport provenance, actuator provenance, diagnostic provenance, CFL limiting, Greenwald fraction, source integral, particle inventory change, and actuator command bounds.

Report artefacts:

validation/reports/density_control_claims.json
validation/reports/density_control_claims.md

Facility-calibrated density-control claims still require matched measured discharge, documented-public, external particle-balance, or facility replay references for Greenwald fraction and particle inventory change.

Burn-control Claim-Admission Benchmark¶

validation/benchmark_burn_control_claims.py publishes bounded repository regression evidence for the DT burn-control and alpha-heating claim boundary. The generated report records alpha power, auxiliary power, Q, Lawson margin, burn fraction, reactivity exponent, thermal stability, controller limits, and the explicit reactor-claim boundary.

Report artefacts:

validation/reports/burn_control_claims.json
validation/reports/burn_control_claims.md

Reactor burn-control claims still require documented public, integrated transport benchmark, or measured burn replay references for alpha power, Q, Lawson margin, burn fraction, and reactivity-exponent agreement.

Volt-second Claim-Admission Benchmark¶

validation/benchmark_volt_second_claims.py publishes bounded repository regression evidence for the scenario volt-second accounting claim boundary. The generated report records ramp, flat-top, and ramp-down flux consumption, Ejima startup flux, bootstrap-current correction, remaining flat-top time, budget margin, and the explicit facility-claim boundary.

Report artefacts:

validation/reports/volt_second_claims.json
validation/reports/volt_second_claims.md

Pulse-duration or central-solenoid commissioning claims still require documented public, measured loop-voltage replay, or external scenario benchmark references for total flux, flat-top duration, Ejima flux, bootstrap current, and budget margin agreement.

Current-drive Claim-Admission Benchmark¶

validation/benchmark_current_drive_claims.py publishes bounded repository regression evidence for the ECCD, LHCD, and NBI current-drive claim boundary. The generated report records grid-normalised absorbed power, total driven current, peak current density, source powers, efficiency coefficients, NBI slowing-down metadata, and the explicit external-claim boundary.

Report artefacts:

validation/reports/current_drive_claims.json
validation/reports/current_drive_claims.md

Ray-traced, Fokker-Planck, or measured-deposition current-drive claims still require strict reference artifacts for total power, driven current, deposition centroid, peak current density, and NBI slowing-down agreement.

Mu-synthesis Claim-Admission Benchmark¶

validation/benchmark_mu_synthesis_claims.py publishes bounded repository regression evidence for the static D-scaled structured-singular-value analysis claim boundary. The generated report records plant dimensions, uncertainty blocks, mu upper bound, robustness margin, controller gain norm, D-scalings, closed-loop spectral abscissa, and the explicit validated-claim boundary.

Report artefacts:

validation/reports/mu_synthesis_claims.json
validation/reports/mu_synthesis_claims.md

Full frequency-dependent D-K synthesis claims still require documented public, external mu-toolbox, or measured control replay references for mu upper bound, robustness margin, controller gain, D-scaling, and closed-loop spectral-abscissa agreement.

Disruption-mitigation Claim-Admission Benchmark¶

validation/benchmark_disruption_mitigation_claims.py publishes deterministic bounded ensemble evidence for the halo-current and runaway-electron mitigation model. The generated report records ensemble seed, run count, prevention rate, P95 halo current, P95 runaway current, mean toroidal-peaking-factor product, ITER-limit summary, and the explicit mitigation-claim admission status.

Generated artefacts:

validation/reports/disruption_mitigation_claims.json
validation/reports/disruption_mitigation_claims.md

Measured disruption-mitigation claims remain blocked until strict measured, external-benchmark, or documented public reference artefacts validate warning lead time, mitigation outcome, halo-current envelope, runaway-beam envelope, and tritium-breeding-ratio metrics inside stated tolerances.

The phase-ordering side of the same tracker is covered by the bounded validation artefacts validation/reports/disruption_sequence.json and validation/reports/disruption_sequence.md, generated by validation/validate_disruption_sequence.py. Those artefacts are not performance benchmarks and keep facility claims blocked pending labelled disruption-window evidence.

Rust Criterion benchmarks¶

Run from the Rust workspace root:

cd scpn-control-rs
cargo bench --workspace

Current benchmark targets:

benches/bench_boris.rs
benches/bench_lif.rs
benches/bench_transport.rs
benches/bench_kuramoto.rs

Criterion artifacts are generated under:

scpn-control-rs/target/criterion/

CI benchmark jobs¶

Rust Criterion (Job 8)¶

cargo bench --workspace
Uploads bench-results from scpn-control-rs/target/criterion/

Python phase-sync benchmark — DIII-D scale (Job 9)¶

Runs kuramoto_sakaguchi_step at N=1000 and N=4096 (DIII-D PCS scale), plus a RealtimeMonitor.tick() (16 layers × 50 oscillators).

Gates: - Single-step P50 < 5 ms (N=4096) - RealtimeMonitor tick P50 < 50 ms

Reproducibility notes¶

Run benchmarks on an idle machine.
Keep --n-bench fixed for comparable CLI timing runs.
Compare same Python/Rust versions and CPU class when evaluating trends.

Multi-Shot Campaign Local Regression Evidence (2026-06-04)¶

The CON-C.6 multi-shot campaign orchestrator was measured on the local workstation with soft CPU affinity on cores 4 and 5. These runs are regression evidence only; they are not production hard-real-time claims because the workstation was not booted with hard core isolation, IRQ shielding, or a PREEMPT_RT kernel.

Surface	Evidence	Samples	Warmup	Median	p95	p99	Max	Evidence class
Python	`validation/reports/multi_shot_campaign_soft_isolated_20260604T131105Z.md`	2000	200	136.581 us	166.287 us	193.620 us	1589.225 us	`local_regression`
Rust	`validation/reports/multi_shot_campaign_rust_soft_isolated_20260604T131112Z.md`	2000	200	2.558 us	3.030 us	4.666 us	15.440 us	`local_regression`

Digest-bound pulsed-MPC replay evidence was remeasured after adding per-shot pulsed_mpc_admission_digest propagation. Each run carried two admitted MPC decision digests through the campaign report.

Surface	Evidence	Samples	Warmup	Median	p95	p99	Max	Evidence class
Python	`validation/reports/multi_shot_campaign_pulsed_mpc_evidence_python_pyo3_20260604T172543Z.md`	2000	200	144.769 us	196.827 us	244.097 us	2333.763 us	`local_regression`
PyO3	`validation/reports/multi_shot_campaign_pulsed_mpc_evidence_python_pyo3_20260604T172543Z.md`	2000	200	10.6215 us	14.885 us	21.042 us	41.502 us	`local_regression`
Rust	`validation/reports/multi_shot_campaign_pulsed_mpc_evidence_rust_20260604T172604Z.md`	2000	200	2.794 us	3.573 us	4.536 us	20.459 us	`local_regression`

How to interpret benchmark evidence¶

Benchmark figures should be treated as a three-value statement: measured latency, reproducibility context, and admissibility.

Measured latency comes from the reported timing columns.
Reproducibility context comes from host, governor, and core-affinity context in the artifact.
Admissibility comes from whether the run appears in a validator-linked release note or validation page.

Do not merge local-regression and isolated-target reports without marking the context difference in your interpretation.

Evidence-first execution checklist¶

Before sharing any benchmark in planning material, check each report against:

Scope: the benchmark class matches the command surface you are discussing (native_formal_*, runtime_admission, multi_shot_campaign, etc.).
Context: host load, isolation, core assignment, and timing-mode metadata are present and readable.
Admission: claim level is mapped to local_regression, production_benchmark, reference_validated, or stronger where available.
Version lock: package version and artifact SHA are retained.

If all four checks pass, the benchmark can enter decision discussions. If any check fails, keep it in implementation-only iteration mode.

Benchmark regression gate¶

The polyglot suite runner (tools/run_benchmark_suite.py) executes the registered Python/Rust comparison benchmarks — currently the capacitor-bank discharge ledger — and emits a scpn-control.benchmark-regression.v1 report: per-benchmark, per-language p50/p95/p99 latency and throughput, plus provenance (CPU model, Rust release profile read from the workspace manifest, commit digest, CPU affinity, load average, peak RSS, and whether the Rust backend was built).

The gate (tools/benchmark_regression_gate.py) compares a fresh report against the tracked baseline (benchmarks/baselines/capacitor_bank.json) under an explicit threshold policy (benchmarks/regression_thresholds.toml). Latency and memory metrics are upper-bounded (current <= baseline * ratio); throughput is lower-bounded (current >= baseline * ratio). It fails closed: a missing report or baseline, a tampered baseline (its baseline_sha256 no longer matches its metrics), a tampered report (payload_sha256 mismatch), a benchmark/language metric present in the baseline but absent from the report, or a metric with no threshold policy are all failures.

# Refresh the baseline on declared hardware (records the CPU it was measured on):
PYTHONPATH=src python tools/run_benchmark_suite.py \
  --steps 400 --warmup 40 --write-baseline benchmarks/baselines/capacitor_bank.json

# Gate a fresh run against it:
PYTHONPATH=src python tools/run_benchmark_suite.py --json-out artifacts/benchmarks/report.json
python tools/benchmark_regression_gate.py \
  --report artifacts/benchmarks/report.json \
  --baseline benchmarks/baselines/capacitor_bank.json

Absolute-latency comparison is only valid on the same CPU as the baseline: the gate emits a hardware_mismatch failure when the report and baseline CPU models differ. The nightly Benchmark nightly workflow therefore runs the gate in --evidence-only mode on GitHub-hosted runners — it collects and uploads the report and verdict without blocking — while real gating is intended on declared or self-hosted hardware whose baseline was captured on the same CPU. The committed baseline carries production_claim_allowed: false; it is a local regression guard, not a published performance claim.

Practical use and scope¶

Use this page for benchmark interpretation and replay evidence, not as a substitute for deployment qualification.

Record benchmark context when comparing results across Python, Rust, and benchmark modes.
Use this with docs/validation.md for claim boundaries and with docs/physics_traceability.md for evidence lineage.
Treat every number as environment- and mode-dependent unless explicitly documented otherwise.