Evaluation Protocol¶
Metrics¶
| Metric | Computation | Target |
|---|---|---|
| R_good convergence | mean R over good_layers at final step | > 0.8 |
| R_bad suppression | mean R over bad_layers at final step | < 0.3 |
| Boundary compliance | fraction of steps with zero hard violations | 1.0 |
| Action count | total ControlActions issued during run | minimise |
| Convergence speed | steps until R_good first exceeds 0.7 | minimise |
Procedure¶
- Load binding spec.
- Initialise phases from a fixed seed (
rng(42)). - Run N steps (default 100).
- Record R_good, R_bad, boundary state, and actions at every step.
- Compute final metrics from the audit log.
Deterministic Replay¶
Replay from an audit log must reproduce the same R_good / R_bad trajectory. Requirements:
- Fixed seed for initial phases.
- Audit log records all ControlActions with exact values.
- No stochastic components in the default supervisor policy.
ReplayEngineloads JSONL and reconstructs step-by-step state.
Ablation Protocol¶
To isolate the contribution of each subsystem, run with components disabled:
| Ablation | Modification |
|---|---|
| No coupling | Set K = 0 (zero matrix). Oscillators free-run. |
| No driver | Set zeta = 0. No external entrainment. |
| No supervisor | Disable SupervisorPolicy. No ControlActions. |
| No imprint | Disable ImprintModel. Static coupling. |
Compare R_good convergence across ablations. The full system should converge faster and higher than any ablation.
Benchmark Domains¶
Run eval on all domainpacks in the domainpacks/ directory:
minimal_domain-- baseline sanity checkqueuewaves-- R_bad suppression scenariogeometry_walk-- symbolic channel testbio_stub-- multi-channel, multi-layer stress test