Skip to content

SPDX-License-Identifier: AGPL-3.0-or-later

Commercial license available

© Concepts 1996–2026 Miroslav Šotek. All rights reserved.

© Code 2020–2026 Miroslav Šotek. All rights reserved.

ORCID: 0009-0009-3560-0851

Contact: www.anulum.li | protoscience@anulum.li

scpn-quantum-control — Native dense-Hamiltonian speedup benchmark

Native Speedup Benchmark

This page documents the reproducible Rust-vs-Qiskit benchmark for dense XY-Hamiltonian construction — the operation the retired "5401× faster than Qiskit" headline referred to. It follows the unified GOTM benchmark standard (agentic-shared/BENCHMARK_STANDARD.md) and mirrors the SCPN-CONTROL regression apparatus.

What is measured, and what is claimed

The harness builds the dense XY Hamiltonian two ways from the same K/omega and parity-checks the results:

  • rust_pyo3scpn_quantum_engine.build_xy_hamiltonian_dense, the real float64 operator kernel.
  • qiskit_sparsepauliopknm_to_hamiltonian(K, omega).to_matrix().

Each backend is warmed up and then sampled with repeats; P50/P95/P99 and throughput are recorded with full provenance (CPU model, Rust release profile read from Cargo.toml, commit, CPU affinity, load average, peak RSS).

The numbers are environment-dependent. They swing with CPU pinning, BLAS threading, and host load, so the artefacts are marked production_claim_allowed: false — they are a reproducible local regression guard, not a published performance claim. The earlier "5401×" figure was a cold-start artefact: an un-warmed Qiskit first-call timed at ~20.9 ms. With warm-up the Rust kernel advantage is large for small systems and shrinks as the dense 2^n × 2^n fill dominates. The production knm_to_dense_matrix wrapper additionally casts float64 → complex128; that cast dominates at large L and is a downstream cost, excluded from this construction-kernel comparison.

Declared-hardware baseline (committed)

Committed at benchmarks/baselines/native_speedup.json, measured on the GOTM workstation (i5-11600K @ 3.90 GHz), pinned to one core for tight samples:

System Rust kernel p50 Qiskit p50 Speedup (p50) Parity
L=4 (16×16) 2.79 µs 269.5 µs 96.5×
L=8 (256×256) 23.1 µs 779.0 µs 33.7×
L=10 (1024×1024) 635.3 µs 2131.3 µs 3.35×
L=12 (4096×4096) 42.2 ms 93.0 ms 2.20×

CI evidence (side by side)

The Native Speedup Benchmark workflow regenerates the report on a fixed ubuntu-latest runner (nightly and on demand via workflow_dispatch) and uploads it as the native-speedup-benchmark artefact. Because a hosted runner's CPU differs from the declared-hardware baseline, the regression gate runs in evidence-only mode there: it collects the verdict (and flags hardware_mismatch) but does not block. Real gating is intended on declared or self-hosted hardware whose baseline was captured on the same CPU. Download the CI artefact to compare its numbers against the declared-hardware baseline above.

Reproduce it yourself — and share your results

pip install -e ".[accelerated]"        # builds + installs the Rust engine
python scripts/benchmark_native_speedup.py --pin-core <idle-core> \
    --json-out my_report.json
python tools/benchmark_native_speedup_gate.py \
    --report my_report.json \
    --baseline benchmarks/baselines/native_speedup.json

The gate validates payload and baseline tamper digests, applies the benchmarks/native_speedup_thresholds.toml policy direction-aware (latency upper-bounded, throughput lower-bounded), and fails closed on missing/tampered evidence. On a different CPU it reports hardware_mismatch rather than a misleading pass/fail. We welcome reproductions on other hardware — open an issue with your report JSON and we will add it.

Regenerating the committed baseline

python scripts/benchmark_native_speedup.py --pin-core <idle-core> \
    --write-baseline benchmarks/baselines/native_speedup.json

The baseline carries evidence_class: local_regression, production_claim_allowed: false, full provenance, and a baseline_sha256 tamper digest.