SPDX-License-Identifier: AGPL-3.0-or-later¶

Commercial license available¶

© Concepts 1996–2026 Miroslav Šotek. All rights reserved.¶

© Code 2020–2026 Miroslav Šotek. All rights reserved.¶

ORCID: 0009-0009-3560-0851¶

Contact: www.anulum.li | protoscience@anulum.li¶

scpn-quantum-control — Benchmarks API Reference¶

Benchmarks API Reference¶

The benchmarks package measures the computational frontier: at what system size does quantum hardware outperform the best classical methods for simulating Kuramoto-XY dynamics? Six modules answer this from different angles — documented classical baselines, exact diagonalisation, GPU statevector, MPS tensor networks, application-oriented metrics, and differentiable-programming conformance.

6 modules, differentiable-programming conformance rows, and 3 crossover estimates.

Rust kernel execution-mode evidence is tracked separately from benchmark timing. tools/audit_rust_kernel_execution.py writes static SIMD/threading inventory artefacts such as data/rust_kernel_execution/rust_kernel_execution_audit_2026-06-16.json; those rows classify PyO3 kernels as scalar/unknown, ndarray-dot, rayon-threaded, or explicit-SIMD evidence only. They do not make performance claims. Timing promotion still requires the isolated benchmark metadata described below.

Architecture¶

Classical Methods                    Comparison                     Quantum Methods
──────────────────                  ──────────                      ───────────────
SciPy ODE / QuTiP / MPS ← classical_baselines → provenance envelope
  Phase ODE, Lindblad, TEBD                           honest optional status

Exact diag + expm       ← quantum_advantage →  Trotter on statevector
  O(2^n × 2^n × 2^n)                              O(n^2 × reps × 2^n)
  Limit: n ≈ 14                                    Limit: n ≈ 23 (RAM)

GPU statevector (A100)  ← gpu_baseline →        QPU gate execution
  O(2^n × n_gates)                                 O(n_gates × gate_time)
  Memory: 2^n × 16B                                Unlimited
  Limit: n ≈ 33 (80 GB)                           Limit: decoherence

MPS tensor network      ← mps_baseline →        Quantum correlations
  O(chi^3 × n × gates)                            Native entanglement
  chi ~ 2^S (entropy)                              No truncation needed
  Fails: volume-law S

AppQSim protocol        ← appqsim_benchmark →   Application fidelity
  Exact ground truth                                VQE / Trotter output

Module Reference¶

1. `classical_baselines` — Documented Reference Backends¶

Provides explicit baseline runs and availability reporting:

scipy_ode_baseline — classical Kuramoto ODE via SciPy solve_ivp.
qutip_lindblad_baseline — optional density-matrix open-system baseline via QuTiP mesolve.
mps_tebd_baseline — optional tensor-network baseline via quimb TEBD.
run_documented_classical_baselines — runs the baseline suite for one K_nm/omega problem.

See Classical Baselines for the provenance contract and examples.

2. `quantum_advantage` — Classical vs Quantum Scaling¶

Measures wall-clock time for exact classical simulation (exact diagonalisation + matrix exponential) against Trotter evolution on the statevector simulator.

`classical_benchmark(n, t_max=1.0, dt=0.1)`¶

Times classical exact evolution of the XY Hamiltonian: 1. Build K_nm (Paper 27) and compile to dense matrix 2. Compute matrix exponential exp(-iHdt) for each time step 3. Evolve state by matrix-vector multiplication 4. Also performs exact diagonalisation for ground energy

For n > 14, returns t_total_ms = inf (2^14 = 16,384 state dimension; matrix expm requires O(2^3n) operations, ~4.4 trillion for n=14).

Returns: {t_total_ms, ground_energy, R_final}

`quantum_benchmark(n, t_max=1.0, dt=0.1, trotter_reps=5)`¶

Times Trotter evolution on statevector: 1. Build Kuramoto initial state: Ry(omega_i)|0> per qubit 2. Compile Hamiltonian to SparsePauliOp 3. Construct PauliEvolutionGate with LieTrotter synthesis 4. Evolve for n_steps = t_max / dt, each with trotter_reps repetitions

Returns: {t_total_ms, n_trotter_steps}

Statevector simulation is O(2^n × n_gates) — exponential in n but polynomial in circuit depth. For n=20, the statevector is 16 MB (feasible); for n=30 it is 16 GB (GPU territory).

Benchmark result provenance records the current git commit through an admitted absolute-path git executable. Missing, non-executable, or failing git commands fail closed to unknown without blocking benchmark execution.

`estimate_crossover(results)`¶

Fits exponential scaling t = a * exp(b * n) to both classical and quantum timings. The crossover is where the curves intersect:

n_cross = log(a_q / a_c) / (b_c - b_q)

Returns int | None. Returns None if fewer than 3 data points or if quantum scaling is not slower than classical (unexpected).

`run_scaling_benchmark(sizes=None, t_max=1.0, dt=0.1)`¶

Full scaling benchmark across system sizes. Default: [4, 8, 12, 16, 20].

from scpn_quantum_control.benchmarks import run_scaling_benchmark

results = run_scaling_benchmark(sizes=[4, 8, 12])
for r in results:
    print(f"n={r.n_qubits}: classical={r.t_classical_ms:.1f}ms, "
          f"quantum={r.t_quantum_ms:.1f}ms")
if results[0].crossover_predicted:
    print(f"Predicted crossover: n={results[0].crossover_predicted}")

Warns if n > 23 (statevector memory > 128 MB).

`AdvantageResult`¶

Field	Type	Description
`n_qubits`	int	System size
`t_classical_ms`	float	Classical exact evolution time (inf if infeasible)
`t_quantum_ms`	float	Trotter statevector evolution time
`errors`	dict	Error metrics (optional)
`crossover_predicted`	int or None	Extrapolated crossover qubit count

3. `gpu_baseline` — GPU vs QPU Comparison¶

Estimates GPU resources needed for statevector simulation and compares with QPU execution time.

GPU Model¶

NVIDIA A100 80 GB: - 312 TFLOPS FP64 - 80 GB HBM2e

Statevector simulation: - Memory: 2^n × 16 bytes (complex128) - FLOPs: n_gates × 2^n × 10 (matrix-vector with constant factor) - Time: FLOPs / TFLOPS

n	Memory	GPU Time (A100)
16	1 MB	0.003 ms
24	256 MB	0.8 ms
30	16 GB	50 s
33	128 GB	OOM
40	16 TB	Infeasible

QPU Model¶

Conservative sequential gate execution: - Gate time: 0.5 us (Heron r2 CZ) - Time: n_gates × 0.5 us

For XY Trotter circuit: n_gates = reps × (n(n-1)/2 CZ + 2n RZ)

`gpu_baseline_comparison(n, trotter_reps=10)`¶

Returns GPUBaselineResult with GPU time, QPU time, and crossover.

from scpn_quantum_control.benchmarks.gpu_baseline import gpu_baseline_comparison

result = gpu_baseline_comparison(n=20, trotter_reps=10)
print(f"GPU: {result.estimated_gpu_time_s:.2e}s")
print(f"QPU: {result.qpu_time_s:.2e}s")
print(f"GPU faster: {result.gpu_faster}")
print(f"Crossover: n={result.crossover_n}")

`scaling_comparison(n_values=None)`¶

Batch comparison across system sizes. Default: [4, 8, 16, 24, 32, 40]. Returns dict with columns: n, gpu_time_s, qpu_time_s, memory_gb, gpu_faster.

Utility Functions¶

Function	Returns
`statevector_memory_gb(n)`	GPU memory in GB
`statevector_flops(n, n_gates)`	Total FLOPs
`estimate_gpu_time(n, n_gates, tflops)`	Wall time in seconds
`estimate_qpu_time(n, n_gates, gate_time_us)`	Wall time in seconds
`gate_count_xy_trotter(n, reps)`	Total gate count

`GPUBaselineResult`¶

Field	Type	Description
`n_qubits`	int	System size
`n_gates`	int	Total gate count
`statevector_memory_gb`	float	GPU memory requirement
`statevector_flops`	float	Computation cost
`estimated_gpu_time_s`	float	A100 wall time
`qpu_time_s`	float	QPU wall time
`gpu_faster`	bool	True if GPU is faster
`crossover_n`	int	n where QPU wins

4. `mps_baseline` — Tensor Network Comparison¶

Matrix Product State (MPS) resource estimation for the Kuramoto-XY system. MPS provides the classical baseline: if MPS at affordable bond dimension matches the quantum simulation, there is no quantum advantage.

MPS Theory¶

Bond dimension chi controls MPS expressibility: - chi = 1: product states only (zero entanglement) - chi = 2^(n/2): exact representation (full Hilbert space) - chi ~ poly(n): efficient classical simulation

The required chi is set by the half-chain entanglement entropy S:

chi >= 2^S

For the Kuramoto-XY system at different coupling regimes: - Below BKT (weak coupling): S ~ log(n), chi ~ poly(n) — MPS efficient - At BKT critical point: S ~ (c/3) log(n) with c=1, chi ~ n^(1/3) — MPS efficient - Above BKT (strong coupling): S ~ n/2 (volume law), chi ~ 2^(n/2) — MPS fails

The quantum advantage boundary is where MPS fails: when the half-chain entropy implies chi > chi_max (limited by available RAM).

`required_bond_dimension(entropy)`¶

Minimum chi from entanglement entropy: chi = ceil(2^S).

`mps_memory(n, chi)`¶

Memory for MPS: n × 2 × chi^2 × 16 bytes (n tensors of shape (chi, 2, chi) in complex128).

`quantum_advantage_n(chi_max=1024, entropy_per_qubit=0.5)`¶

Estimates system size where MPS fails under volume-law entanglement:

S = entropy_per_qubit × n/2
chi = 2^S > chi_max  ⟹  n > 2 × log2(chi_max) / entropy_per_qubit

For chi_max=1024, entropy_per_qubit=0.5: n > 40. For chi_max=256: n > 32.

`mps_baseline_comparison(K, omega, chi_max=256)`¶

Full comparison for a specific system:

from scpn_quantum_control.benchmarks.mps_baseline import mps_baseline_comparison
from scpn_quantum_control.bridge.knm_hamiltonian import OMEGA_N_16, build_knm_paper27

K = build_knm_paper27(L=8)
omega = OMEGA_N_16[:8]
result = mps_baseline_comparison(K, omega)
print(f"Entropy S = {result.half_chain_entropy:.3f}")
print(f"Required chi = {result.required_bond_dim}")
print(f"MPS memory: {result.mps_memory_bytes / 1e6:.1f} MB")
print(f"Exact memory: {result.exact_memory_bytes / 1e6:.1f} MB")
print(f"Compression: {result.compression_ratio:.1f}x")
print(f"MPS tractable: {result.mps_tractable}")

`MPSBaselineResult`¶

Field	Type	Description
`n_qubits`	int	System size
`half_chain_entropy`	float	S(n/2) from exact ground state
`required_bond_dim`	int	chi = ceil(2^S)
`mps_memory_bytes`	int	MPS storage requirement
`exact_memory_bytes`	int	Full statevector storage
`compression_ratio`	float	exact/MPS memory ratio
`quantum_advantage_threshold`	int	n where MPS at chi_max fails
`mps_tractable`	bool	chi_required <= chi_max

5. `appqsim_protocol` — Application-Oriented Metrics¶

Reference: Lubinski et al., QST 8, 024003 (2023).

Measures simulation quality via application-relevant metrics, not just circuit fidelity. For the Kuramoto-XY system:

Order parameter accuracy: |R_quantum - R_exact|
Energy accuracy: |E_q - E_exact| / |E_exact| × 100%
Correlation fidelity: 1 - ||C_q - C_exact||_F / ||C_exact||_F

These answer the physics question: does the quantum simulation correctly reproduce the synchronisation transition? A VQE that gets the energy right but the correlators wrong is not useful for studying phase transitions.

`appqsim_benchmark(K, omega, circuit_sv=None, n_gates=0, circuit_depth=0)`¶

Full AppQSim evaluation:

from scpn_quantum_control.benchmarks.appqsim_protocol import appqsim_benchmark
from scpn_quantum_control.bridge.knm_hamiltonian import OMEGA_N_16, build_knm_paper27

K = build_knm_paper27(L=4)
omega = OMEGA_N_16[:4]
metrics = appqsim_benchmark(K, omega)
print(f"R error: {metrics.order_parameter_error:.4f}")
print(f"Energy error: {metrics.energy_relative_error_pct:.2f}%")
print(f"Correlation fidelity: {metrics.correlation_fidelity:.4f}")

If circuit_sv is not provided, internally runs VQE (ansatz_reps=2, maxiter=100) to generate the quantum state.

The correlation fidelity computes the Frobenius-norm distance between quantum and exact <X_i X_j + Y_i Y_j> correlator matrices over all qubit pairs.

`AppQSimMetrics`¶

Field	Type	Description
`order_parameter_error`	float
`energy_relative_error_pct`	float
`correlation_fidelity`	float	1 -
`n_qubits`	int	System size
`n_gates`	int	Circuit gate count
`circuit_depth`	int	Circuit depth

6. `differentiable_programming` — Program AD Conformance¶

Provides deterministic correctness rows for the native differentiable programming surface. These rows compare implemented program AD gradients against analytic references and explicitly avoid wall-clock, compiler, LLVM, Rust, JIT, or hardware performance claims.

`run_differentiable_programming_benchmark_suite()`¶

Runs the committed conformance rows:

Case	Category	Contract
`loop_heavy_scalar`	loop-heavy	Executed Python loops with scalar ufuncs
`program_ad_ir_roundtrip_contracts`	ir-roundtrip	Bounded `program_ad_effect_ir.v1` parser and stable serialization round-trip conformance for emitted Program AD SSA/effect/control/phi metadata; bytecode/source compiler frontend, full alias lattice, non-executed branch semantics, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`program_ad_rust_scalar_interpreter_contracts`	rust-interpreter	Optional native Rust scalar forward/value-gradient replay of opcode-bearing `program_ad_effect_ir.v1` rows with parity against Python whole-program and analytic references when `scpn_quantum_engine` is built; Python-only environments report explicit blocked reasons and do not promote Rust execution, reverse-mode Rust AD, general Program AD execution, LLVM/JIT, provider, hardware, or performance evidence
`program_ad_control_phi_metadata_contracts`	control-phi	Program AD control-join metadata conformance for supported executed runtime and source control regions, with `ProgramADPhiNode` parser round-trip, analytic gradient parity, and adjoint replay parity; non-executed branch adjoints, full compiler phi lowering, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`program_ad_registry_dispatch_contracts`	registry-dispatch	Registry-dispatched coverage for 118 declared Program AD primitives across array, shape, reduction, stencil, interpolation, assembly, signal, elementwise, selection, product, cumulative, and linalg families; the row validates derivative, batching, lowering metadata, shape, dtype, static-argument, nondifferentiability, and effect contracts only, while executable Rust/LLVM/JIT, provider, hardware, and performance evidence remain blocked
`program_adjoint_replay_provenance_contracts`	reverse-adjoint	Program AD reverse adjoint generation over supported executed scalar IR, with `ProgramADAdjointResult` gradient parity, generated `ProgramADAdjointStep` rows, finite local pullback scales, cotangent-flow rows, reverse effect-order rows, replay node/effect/runtime control/phi row bindings, and blocked non-executed phi inputs bound to `program_ad_effect_ir.v1`; full reverse-mode compiler AD, non-executed branch adjoints, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`elementwise_boundary_contracts`	elementwise-boundary	Registry-gated builtin `abs`, NumPy absolute value, positive-domain, nonzero-denominator, and inverse-trig boundary contracts with analytic gradient and adjoint parity checks; unsupported domain boundaries, derivative-losing `sign`/`heaviside` kernels, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`matrix_heavy_linear_algebra`	matrix-heavy	Dot, inner, outer, trace, tensordot, and einsum semantics
`selection_piecewise_contracts`	selection-heavy	Registry-gated `where`/`clip` branch and boundary contracts, strict no-tie `sort`, static selection folds with `np.select`, callable `np.piecewise`, static-selector `np.choose`, static-mask `np.compress`, and same-size static-mask `np.extract`, plus fail-closed integer-output selector contracts exposed through the dashboard selection primitive row; dynamic masks, dynamic selectors, ties, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`structured_numeric_primitive_contracts`	structured-numeric	Registry-gated product, interpolation, signal, and stencil contracts for `inner`, `outer`, `matmul`, `tensordot`, `einsum`, `interp`, `convolve`, `correlate`, and `gradient`
`cumulative_primitive_contracts`	cumulative-primitive	Registry-gated bounded `cumsum`, `cumprod`, and `diff` trace contracts with analytic gradient and adjoint parity checks; dynamic axis promotion, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`assembly_primitive_contracts`	assembly-primitive	Registry-gated like-constructor and stack assembly contracts for `zeros_like`, `ones_like`, `full_like`, `hstack`, `vstack`, `column_stack`, and `dstack`; dynamic shape assembly, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`reduction_primitive_contracts`	reduction-primitive	Registry-gated bounded `sum`, `prod`, `mean`, `var`, `std`, `trapezoid`, unique `max`/`min`, `median`, scalar-`q` `quantile`, and scalar-`q` `percentile` contracts with analytic gradient and adjoint parity checks; dynamic axes, dynamic q, tie boundaries, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`shape_primitive_contracts`	shape-primitive	Registry-gated bounded reshape, ravel, transpose, expand/squeeze, swap/move axis, repeat, rank-promotion, tile, roll, rot90, flip, flipud, and fliplr contracts with analytic gradient and adjoint parity checks; dynamic shape arguments, invalid axes, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`broadcast_primitive_contracts`	broadcast-primitive	Registry-gated bounded `broadcast_to`, `broadcast_arrays`, and binary elementwise rank-broadcasting contracts with analytic gradient and adjoint parity checks; dynamic output shapes, incompatible shapes, subclass propagation, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`linalg_primitive_contracts`	linalg-primitive	Registry-gated determinant, inverse, solve, trace, diagonal, flattened diagonal, matrix-power, and multi-dot contracts
`indexing_static_gather_contracts`	indexing-heavy	Static slicing, static-axis concatenate/stack assembly, `np.hstack`/`np.vstack`/`np.column_stack`/`np.dstack` assembly conveniences, nested `np.block` assembly, static `np.split`/`np.array_split`/`np.hsplit`/`np.vsplit`/`np.dsplit` gather assembly, static `np.tril`/`np.triu` triangular masks, static `np.diagonal` offset/axis gather assembly, static `np.broadcast_arrays` broadcast assembly, static integer/boolean advanced getitem, `np.take` raise/wrap/clip modes, `np.take_along_axis`, static `np.delete`, static constant `np.pad`, static constant `np.insert`, `np.append`, strict finite no-tie `np.sort` adjoint routing, static-grid `np.trapezoid` adjoint routing, static scalar and coordinate `np.gradient` finite-difference adjoint routing, static-grid `np.interp` piecewise-linear adjoint routing, one-dimensional `np.convolve` signal/kernel adjoint routing, one-dimensional `np.correlate` signal/reference adjoint routing, and repeated adjoint accumulation
`mutation_heavy_forward_only`	mutation-heavy	Static array mutation dataflow
`shape_view_alias_metadata_contracts`	alias-effect	Program AD alias metadata conformance for supported executed shape/view transformations; full static alias lattice, non-executed view/control paths, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`slice_mutation_alias_metadata_contracts`	alias-effect	Program AD alias metadata conformance for static rank-1 slice mutation; broader object aliases, non-executed view/control paths, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`loop_carried_state_alias_metadata_contracts`	alias-effect	Program AD source metadata conformance for loop-carried derivative state; full loop checkpointing, non-executed paths, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`program_ad_static_alias_lattice_contracts`	alias-lattice	Static alias-lattice readiness over emitted `program_ad_effect_ir.v1` components with view-alias, bounded local object-attribute, expression-rebinding, explicit mutation-effect blockers, unsupported-Python frontend diagnostic blockers, captured/global object-attribute roots/details pinned to static object-model blockers, unknown alias-edge provenance pinned to fail-closed blockers, non-executed phi blockers, and control-path alias blocker reporting; captured/global object-attribute alias sets, unknown dynamic alias promotion, arbitrary dynamic-Python frontend lowering, mutation adjoints, non-executed branch adjoints, Rust/LLVM executable lowering, hardware, and performance promotion remain blocked
`transform_nesting_vmap_program_grad`	transform-nesting	`vmap` over program AD gradients plus whole-program `grad(vmap(f))` over trace-aware leaves
`transform_nesting_whole_program_higher_order`	transform-nesting	`jacfwd` and `jacrev` over whole-program `grad(vmap(f))` checked against analytic block-diagonal curvature

`run_differentiable_programming_external_reference_suite()`¶

Runs optional JAX reference comparisons when JAX is installed. When JAX is not available, it returns an empty tuple rather than weakening the base dependency contract.

`run_quantum_gradient_benchmark_suite()`¶

Runs deterministic parameter-shift correctness rows for small smooth quantum expectation objectives. Each row records the parameter-shift gradient, central finite-difference gradient, analytic reference gradient, finite-difference verification pass/fail flag, objective-evaluation count, and a claim boundary.

Case	Category	Contract
`single_rotation_parameter_shift`	quantum-gradient	One-parameter Pauli-rotation expectation with analytic `-sin(theta)` reference
`two_parameter_phase_expectation`	quantum-gradient	Two-parameter phase expectation with analytic mixed sine/cosine reference
`sparse_ising_chain_six_qubit_expectation`	quantum-gradient	Six-parameter nearest-neighbour sparse Ising-chain expectation with analytic field/coupling gradient reference
`jax_registered_phase_qnode_aot_export_lowering`	quantum-gradient	Optional installed-JAX AOT/export value-route diagnostic for deterministic registered local Phase-QNode circuits; gradient fields remain parameter-shift and finite-difference references while exported VJP, persistent cross-platform execution, hardware, and performance promotion remain blocked
`torch_registered_phase_qnode_compile_boundary_diagnostic`	quantum-gradient	Optional installed-PyTorch compile-boundary diagnostic for deterministic registered local Phase-QNode circuits; non-fullgraph correctness is compared with parameter-shift and finite-difference references while dynamic-shape, fullgraph compiled-frame, AOTAutograd/export, CUDA, provider, hardware, isolated-benchmark, and performance promotion remain blocked

These rows are correctness/conformance benchmarks only. They do not claim hardware execution, provider integration, framework-native autodiff, or wall-clock performance.

`run_differentiable_external_comparison_suite()`¶

Runs optional external comparison rows for JAX, PyTorch, TensorFlow, PennyLane, LLVM/Enzyme runner evidence, and Catalyst qjit/MLIR/QIR runner evidence. The SCPN analytic reference remains the source of truth. Missing optional dependencies are emitted as hard_gap rows instead of being omitted.

Every row carries dependency-version metadata for the backend being classified. The suite also emits explicit unsupported-route rows for promotion-blocking cases: unsupported batching, unsupported nested transforms, unsupported complex dtype routes, and unsupported hardware-device routes. Those rows are hard gaps, not skipped tests or degraded successes.

`write_differentiable_external_comparison()`¶

Writes the external comparison rows to a JSON artefact with schema scpn_qc_differentiable_external_comparison_v1. The artefact records the row payloads, dependency versions, toolchain metadata where available, failure classes, Python/platform metadata, and the fixed functional_non_isolated classification. It is a reproducibility and correctness artefact only: production_eligible and promotion_ready are false until the isolated benchmark gate supplies artefact IDs and the claim ledger is updated. The writer publishes a row_schema.required_fields list and rejects rows that do not carry value error, gradient error, runtime, memory, batching support, transform support, failure class, dependency versions, toolchain slot, and claim-boundary fields. scripts/run_differentiable_benchmark_evidence.py writes this companion JSON file as diff-qnode-external-comparison.json and inserts the real external-comparison artefact ID into the benchmark evidence bundle's evidence_artifact_ids list.

`run_identical_circuit_gradient_comparison_suite()`¶

Runs a stricter exact-state competitor-gradient comparison for the same registered Phase-QNode circuit across SCPN, Qiskit, and PennyLane:

Circuit: one-qubit RY(theta).
Parameters: [0.4].
Observable: Z0.
Shot policy: exact-state mode, shots=None.
Failure classes: dependency missing or runtime error per backend.

Both Qiskit and PennyLane rows must carry the same circuit fingerprint and pass value and gradient agreement before the artefact reports identical_circuit_ready=True.

`write_identical_circuit_gradient_comparison()`¶

Writes the identical-circuit rows to a JSON artefact with schema scpn_qc_identical_circuit_gradient_comparison_v1. The committed local artefact data/differentiable_phase_qnode/identical_circuit_gradient_comparison_20260616.json records two success rows, one for Qiskit and one for PennyLane, with the same circuit, same parameters, same observable, and exact-state shot policy. It is a correctness artefact only: promotion_ready remains false until separate isolated benchmark evidence and claim-ledger promotion metadata exist.

For LLVM/Enzyme, set SCPN_ENZYME_RUNNER to an absolute path for an executable file that reads a JSON request on stdin and writes JSON with:

{
  "value": 0.0,
  "gradient": [0.0, 0.0],
  "toolchain": {"enzyme": "version", "llvm": "version"}
}

The runner row rejects relative paths, missing files, and non-executable files before subprocess execution. It enforces a timeout (SCPN_ENZYME_RUNNER_TIMEOUT_SECONDS, default 10), validates finite scalar/vector outputs, records toolchain metadata, and reports correctness_mismatch unless value and gradient match the SCPN reference. These rows are comparison evidence only; they do not claim provider execution, QPU execution, GPU execution, arbitrary-program AD, or production performance. Use the same absolute-executable rule for SCPN_CATALYST_RUNNER.

When Enzyme is supplied through the Enzyme-JAX package rather than a standalone enzyme executable, set ENZYME_LLVM_PLUGIN to the installed native extension path. The benchmark metadata records the enzyme_ad package version plus the runner and plugin paths. If the package is installed but the runner fails during lowering or execution, the row is a runtime_error hard gap rather than a dependency_missing hard gap.

`run_differentiable_hardening_slice_gate()`¶

Returns a JSON-ready DifferentiableHardeningSliceGateResult for the focused closeout checks required by every differentiable hardening slice. Callers pass changed source paths and module-specific pytest files; the gate records the expected Ruff, mypy, pytest, test-quality audit, and claim-ledger validation commands and rejects bucket-wide pytest targets such as tests.

The result also replays benchmark-evidence classification smoke cases: GitHub-hosted runners remain functional_non_isolated, incomplete self-hosted isolated metadata remains a hard_gap, complete isolated-runner metadata is the only isolated_affinity path, and requested accelerator execution without visible device evidence remains silent_accelerator_fallback. This API does not execute the listed commands and does not promote any benchmark row to production evidence.

`run_differentiable_isolated_benchmark_plan()`¶

Returns a JSON-ready DifferentiableIsolatedBenchmarkPlan for the current differentiable benchmark and evidence artefacts that are not yet promotion-grade. The plan covers the committed local benchmark bundle, Phase-QNode affinity row, identical-circuit gradient comparison, domain dataset closure, PyTorch maturity audit, and Enzyme/MLIR maturity audit. Each row records source artefact paths, source classifications, the required self-hosted, linux, and isolated-benchmark runner labels, a taskset plus chrt rerun command, required host context, expected output paths, and blockers.

The committed artefact data/differentiable_phase_qnode/differentiable_isolated_benchmark_plan_20260627.json is a batch plan, not a benchmark result. promotion_ready remains false until every row has validated isolated_affinity output artefacts and no host or source-classification blockers. The companion validator checks paths, rerun commands, labels, output locations, and source classifications without executing benchmarks or changing claim-ledger promotion status.

The CI benchmark evidence writer records accelerator metadata in every bundle. The default is explicit CPU-only evidence. To request accelerator evidence, set SCPN_BENCH_ACCELERATOR_BACKEND=cuda or rocm and provide visible-device metadata through SCPN_BENCH_ACCELERATOR_DEVICE_IDS, CUDA_VISIBLE_DEVICES, ROCR_VISIBLE_DEVICES, or HIP_VISIBLE_DEVICES. CUDA requests can also use JAX CUDA device discovery when the CUDA-enabled jaxlib plugin is installed. Optional names and runtime versions can be recorded with SCPN_BENCH_ACCELERATOR_DEVICE_NAMES and SCPN_BENCH_ACCELERATOR_RUNTIME (cuda=12.4,cudnn=9.1). Requested accelerator execution without matching visible devices is classified as hard_gap / silent_accelerator_fallback, so a CPU fallback cannot be reused as GPU benchmark evidence.

Crossover Summary¶

The three crossover estimates address different classical methods:

Method	Classical Cost	Crossover n	Bottleneck
Exact diag + expm	O(8^n)	~14	RAM + FLOPs
GPU statevector	O(2^n × gates)	~33 (A100)	GPU memory
MPS tensor network	O(chi^3 × n × gates)	32-40 (chi_max=256-1024)	Entanglement

Below n=14, classical exact methods win for the benchmarked workloads. Between 14 and 33, GPU statevector estimates set the relevant local memory boundary. Above n=33-40, exact statevector methods exceed the assumed GPU memory envelope and MPS estimates become entanglement- and bond-cap dependent. This is a resource-boundary diagnostic only; no broad quantum-advantage claim follows without a committed classical baseline, observable tolerance, and hardware dataset for the specific workload.

Dependencies¶

Module	Internal	External
`quantum_advantage`	bridge.knm_hamiltonian, hardware.classical	scipy (curve_fit)
`gpu_baseline`	—	— (pure estimates)
`mps_baseline`	analysis.entanglement_spectrum	numpy
`appqsim_protocol`	bridge., hardware.classical, analysis.	qiskit
`differentiable_programming`	differentiable	numpy; optional jax reference rows

The core benchmark suite runs with the base installation. Optional external reference rows declare their backend availability instead of fabricating comparisons.

Testing¶

35 tests across 4 test files:

test_quantum_advantage.py — Scaling correctness, crossover estimation, edge cases
test_gpu_baseline.py — Memory estimates, time estimates, comparison logic
test_mps_baseline.py — Bond dimension, memory, tractability threshold
test_appqsim_protocol.py — Metric ranges, VQE fallback, correlator fidelity

Pipeline Performance¶

Measured on ML350 Gen8 (128 GB RAM, Xeon E5-2620v2):

Operation	System	Wall Time
`classical_benchmark`	4 qubits	8 ms
`classical_benchmark`	8 qubits	120 ms
`classical_benchmark`	12 qubits	8,500 ms
`classical_benchmark`	14 qubits	~45,000 ms
`quantum_benchmark`	4 qubits	15 ms
`quantum_benchmark`	8 qubits	45 ms
`quantum_benchmark`	12 qubits	350 ms
`quantum_benchmark`	16 qubits	3,200 ms
`gpu_baseline_comparison`	any n	0.01 ms (pure estimate)
`mps_baseline_comparison`	8 qubits	25 ms
`appqsim_benchmark`	4 qubits	350 ms

The classical benchmark hits a wall at n=14 (45 seconds). The quantum benchmark scales polynomially in 2^n, reaching 3.2 seconds at n=16. GPU and MPS baselines are pure estimates (no actual simulation).

SPDX-License-Identifier: AGPL-3.0-or-later¶

Commercial license available¶

© Concepts 1996–2026 Miroslav Šotek. All rights reserved.¶

© Code 2020–2026 Miroslav Šotek. All rights reserved.¶

ORCID: 0009-0009-3560-0851¶

Contact: www.anulum.li | protoscience@anulum.li¶

scpn-quantum-control — Benchmarks API Reference¶

Benchmarks API Reference¶

Architecture¶

Module Reference¶

1. classical_baselines — Documented Reference Backends¶

2. quantum_advantage — Classical vs Quantum Scaling¶

classical_benchmark(n, t_max=1.0, dt=0.1)¶

quantum_benchmark(n, t_max=1.0, dt=0.1, trotter_reps=5)¶

estimate_crossover(results)¶

run_scaling_benchmark(sizes=None, t_max=1.0, dt=0.1)¶

AdvantageResult¶

3. gpu_baseline — GPU vs QPU Comparison¶

GPU Model¶

QPU Model¶

gpu_baseline_comparison(n, trotter_reps=10)¶

scaling_comparison(n_values=None)¶

Utility Functions¶

GPUBaselineResult¶

4. mps_baseline — Tensor Network Comparison¶

MPS Theory¶

required_bond_dimension(entropy)¶

mps_memory(n, chi)¶

quantum_advantage_n(chi_max=1024, entropy_per_qubit=0.5)¶

mps_baseline_comparison(K, omega, chi_max=256)¶

MPSBaselineResult¶

5. appqsim_protocol — Application-Oriented Metrics¶

appqsim_benchmark(K, omega, circuit_sv=None, n_gates=0, circuit_depth=0)¶

AppQSimMetrics¶

6. differentiable_programming — Program AD Conformance¶

run_differentiable_programming_benchmark_suite()¶

run_differentiable_programming_external_reference_suite()¶

run_quantum_gradient_benchmark_suite()¶

run_differentiable_external_comparison_suite()¶

write_differentiable_external_comparison()¶

run_identical_circuit_gradient_comparison_suite()¶

write_identical_circuit_gradient_comparison()¶

run_differentiable_hardening_slice_gate()¶

run_differentiable_isolated_benchmark_plan()¶