Zynq UltraScale+ target contract¶
SC-NeuroCore now has an explicit Zynq UltraScale+ MPSoC target contract for the SystemVerilog emitter and Vivado project generator. The contract is deliberately fail-closed: it emits target metadata, estimates conservative resources, and rejects unsupported SKU names instead of fabricating board-level timing or pin claims.
Supported SKU baseline¶
The public target baseline covers the currently planned ZU3EG and ZU9EG surfaces. Device resources are recorded as compiler budgets, not as timing closure evidence. Timing closure still requires Vivado on a board-specific self-hosted runner with verified XDC pin assignments.
| SKU | Part | DSP primitive | LUT budget | FF budget | DSP budget | 36K BRAM budget | URAM budget |
|---|---|---|---|---|---|---|---|
| ZU3EG | xczu3eg-sbva484-1-e |
DSP48E2 |
70,560 | 141,120 | 360 | 216 | 0 |
| ZU9EG | xczu9eg-ffvb1156-2-e |
DSP48E2 |
274,080 | 548,160 | 2,520 | 912 | 0 |
The primitive is DSP48E2 for Zynq UltraScale+ MPSoC. SC-NeuroCore must not
claim a newer-family DSP primitive for this target.
Compiler behaviour¶
The Rust IR target surface is SvTarget::zynq_ultrascale_plus(SkuKind, clock_mhz).
When this target is selected, emit_systemverilog_with_target returns both the
generated SystemVerilog and a ResourceReport.
The target-aware emitter currently:
- adds a target header comment containing the SKU, part, clock, and DSP primitive;
- annotates arithmetic kernels with
use_dsp=yesandsc_target_dsp=DSP48E2; - adds RAM-style metadata for constant vectors so larger tables can be steered toward block memory when appropriate;
- reports conservative LUT, DSP, BRAM, URAM, and critical-path estimates;
- reports boolean fit decisions against the selected SKU budgets.
The legacy emit(graph) API remains target-neutral and emits the generic
SystemVerilog contract.
Vivado project generator¶
tools/gen_vivado_project.py converts a JSON manifest into a deterministic
Vivado batch Tcl project. A manifest must define:
top: top-level module name;sku:zu3egorzu9eg;clock_mhz: positive integer target clock;sources: one or more existing SystemVerilog source files;xdc: one or more existing constraint files;output_dir: optional project output directory.
The checked-in XDC files under hdl/targets/ultrascale_plus/ intentionally
contain only clock/timing constraints. They do not contain PACKAGE_PIN or
LOC placement constraints because no board-revision pin manifest has been
verified in this repository. Adding fabricated pins would create false hardware
evidence.
Resource finding from the 2026-06-04 benchmark¶
The isolated 2026-06-04 comparison benchmark measured the target contract across Python/Vivado-Tcl and Rust surfaces.
| Surface | Artefact | Median | Isolation evidence | Key contract result |
|---|---|---|---|---|
| Python + Vivado Tcl | benchmarks/results/local_python_2026-06-04_ultrascale_plus_target.json |
122.678 us/manifest | cgroup_effective_cpuset=10-11, runtime_cpuset_shield_claimed=true |
deterministic ZU3EG/ZU9EG Tcl generation with DSP48E2 baseline |
| Rust | benchmarks/results/local_rust_2026-06-04_ultrascale_plus_target.json |
130.836 us/emit | cpu_affinity=10-11, cgroup_effective_cpuset=10-11, runtime_cpuset_shield_claimed=true |
64x32 dense graph estimates 2,048 DSPs, exceeding the ZU3EG budget of 360 |
The 64x32 dense result is not a failure of the benchmark. It is the intended fail-closed hardware conclusion: a one-DSP-per-MAC implementation of that graph cannot be honestly claimed to fit ZU3EG. ZU3EG deployment for that workload requires a folded or time-multiplexed dense implementation, a smaller layer, or a larger target.
Dense folding contract¶
The follow-up dense-folding contract provides that missing resource-safe path.
SvTarget::dense_fold_plan(64, 32) and tools/ultrascale_dense_folding.py
both compute the same ZU3EG plan:
| Field | Value |
|---|---|
| Unfurled MACs | 2,048 |
| ZU3EG DSP budget | 360 |
| Output rows per cycle | 5 |
| Input lanes per output row | 64 |
| DSPs per compute cycle | 320 |
| Output fold factor | 7 |
| Input fold factor | 1 |
| Compute cycles | 7 |
The SystemVerilog emitter now annotates over-budget UltraScale+ dense
instances with this fold plan so generated RTL carries the resource remedy next
to the unfurled dense instance. The standalone
hdl/sc_dense_folded_q88_core.v implements the folded Q8.8-weight/Q16.16-MAC
execution contract and is covered by Icarus simulation. The benchmark also runs
bounded Yosys elaboration on an 8x8 parameterisation, reporting 240 generic
cells. That Yosys number validates HDL elaboration only; it is not a ZU3EG
Vivado timing or utilisation report.
The folded core is deterministic fixed-point dense logic. It does not silently replace the existing stochastic dense layer in the generic emitter path. Deployment code must select it deliberately when the target resource contract requires folding.
Vivado gate¶
tests/test_ultrascale_plus_flow.py includes an opt-in Vivado CI gate. It runs
only when MIF_VIVADO_CI=1 and vivado is available. Until that self-hosted
runner and a verified board-specific pin map exist, SC-NeuroCore claims target
contract evidence and synthesis-flow readiness, not board-level timing closure.
Reference: AMD publishes the Zynq UltraScale+ MPSoC device-family overview and product tables in DS891 at https://docs.amd.com/go/en-US/ds891-zynq-ultrascale-plus-overview.