Zynq UltraScale+ target contract¶

SC-NeuroCore now has an explicit Zynq UltraScale+ MPSoC target contract for the SystemVerilog emitter and Vivado project generator. The contract is deliberately fail-closed: it emits target metadata, estimates conservative resources, and rejects unsupported SKU names instead of fabricating board-level timing or pin claims.

Supported SKU baseline¶

The public target baseline covers the currently planned ZU3EG and ZU9EG surfaces. Device resources are recorded as compiler budgets, not as timing closure evidence. Timing closure still requires Vivado on a board-specific self-hosted runner with verified XDC pin assignments.

SKU	Part	DSP primitive	LUT budget	FF budget	DSP budget	36K BRAM budget	URAM budget
ZU3EG	`xczu3eg-sbva484-1-e`	`DSP48E2`	70,560	141,120	360	216	0
ZU9EG	`xczu9eg-ffvb1156-2-e`	`DSP48E2`	274,080	548,160	2,520	912	0

The primitive is DSP48E2 for Zynq UltraScale+ MPSoC. SC-NeuroCore must not claim a newer-family DSP primitive for this target.

Compiler behaviour¶

The Rust IR target surface is SvTarget::zynq_ultrascale_plus(SkuKind, clock_mhz). When this target is selected, emit_systemverilog_with_target returns both the generated SystemVerilog and a ResourceReport.

The target-aware emitter currently:

adds a target header comment containing the SKU, part, clock, and DSP primitive;
annotates arithmetic kernels with use_dsp=yes and sc_target_dsp=DSP48E2;
adds RAM-style metadata for constant vectors so larger tables can be steered toward block memory when appropriate;
reports conservative LUT, DSP, BRAM, URAM, and critical-path estimates;
reports boolean fit decisions against the selected SKU budgets.

The legacy emit(graph) API remains target-neutral and emits the generic SystemVerilog contract.

Vivado project generator¶

tools/gen_vivado_project.py converts a JSON manifest into a deterministic Vivado batch Tcl project. A manifest must define:

top: top-level module name;
sku: zu3eg or zu9eg;
clock_mhz: positive integer target clock;
sources: one or more existing SystemVerilog source files;
xdc: one or more existing constraint files;
output_dir: optional project output directory.

The checked-in XDC files under hdl/targets/ultrascale_plus/ intentionally contain only clock/timing constraints. They do not contain PACKAGE_PIN or LOC placement constraints because no board-revision pin manifest has been verified in this repository. Adding fabricated pins would create false hardware evidence.

Resource finding from the 2026-06-04 benchmark¶

The isolated 2026-06-04 comparison benchmark measured the target contract across Python/Vivado-Tcl and Rust surfaces.

Surface	Artefact	Median	Isolation evidence	Key contract result
Python + Vivado Tcl	`benchmarks/results/local_python_2026-06-04_ultrascale_plus_target.json`	122.678 us/manifest	`cgroup_effective_cpuset=10-11`, `runtime_cpuset_shield_claimed=true`	deterministic ZU3EG/ZU9EG Tcl generation with `DSP48E2` baseline
Rust	`benchmarks/results/local_rust_2026-06-04_ultrascale_plus_target.json`	130.836 us/emit	`cpu_affinity=10-11`, `cgroup_effective_cpuset=10-11`, `runtime_cpuset_shield_claimed=true`	64x32 dense graph estimates 2,048 DSPs, exceeding the ZU3EG budget of 360

The 64x32 dense result is not a failure of the benchmark. It is the intended fail-closed hardware conclusion: a one-DSP-per-MAC implementation of that graph cannot be honestly claimed to fit ZU3EG. ZU3EG deployment for that workload requires a folded or time-multiplexed dense implementation, a smaller layer, or a larger target.

Dense folding contract¶

The follow-up dense-folding contract provides that missing resource-safe path. SvTarget::dense_fold_plan(64, 32) and tools/ultrascale_dense_folding.py both compute the same ZU3EG plan:

Field	Value
Unfurled MACs	2,048
ZU3EG DSP budget	360
Output rows per cycle	5
Input lanes per output row	64
DSPs per compute cycle	320
Output fold factor	7
Input fold factor	1
Compute cycles	7

The SystemVerilog emitter now annotates over-budget UltraScale+ dense instances with this fold plan so generated RTL carries the resource remedy next to the unfurled dense instance. The standalone hdl/sc_dense_folded_q88_core.v implements the folded Q8.8-weight/Q16.16-MAC execution contract and is covered by Icarus simulation. The benchmark also runs bounded Yosys elaboration on an 8x8 parameterisation, reporting 240 generic cells. That Yosys number validates HDL elaboration only; it is not a ZU3EG Vivado timing or utilisation report.

The folded core is deterministic fixed-point dense logic. It does not silently replace the existing stochastic dense layer in the generic emitter path. Deployment code must select it deliberately when the target resource contract requires folding.

Vivado gate¶

tests/test_ultrascale_plus_flow.py includes an opt-in Vivado CI gate. It runs only when MIF_VIVADO_CI=1 and vivado is available. Until that self-hosted runner and a verified board-specific pin map exist, SC-NeuroCore claims target contract evidence and synthesis-flow readiness, not board-level timing closure.

Reference: AMD publishes the Zynq UltraScale+ MPSoC device-family overview and product tables in DS891 at https://docs.amd.com/go/en-US/ds891-zynq-ultrascale-plus-overview.