release: iDMA 0.7.0 by DanielKellerM · Pull Request #129 · pulp-platform/iDMA

DanielKellerM · 2026-06-16T11:36:08Z

Draft release PR — iDMA 0.7.0 (devel → master). Consolidates the 28 commits on devel. 0.7.0 is a breaking minor (0.x minor = breaking axis): the register interface and idma_pkg vocabulary change.

Highlights / breaking changes

Register blocks regenerated with SystemRDL (PeakRDL) over APB, replacing the reggen/reg_bus flow — the register frontend interface changes (#73).
Multi-head DMA — multiple read/write AXI channels selected per transfer via src_head/dst_head; adds idma_pkg types (#85, #123).
Build/toolchain: morty → Bender slang pickle, pickle output at target/pickle/, and Python deps managed by uv (pyproject.toml/uv.lock) instead of pip (#110, #114, #111, #121).

Added

Multi-head DMA capabilities + directed verification testbenches (#85, #123).
Starlight documentation site with generated architecture/hierarchy diagrams (#103, #117).
Streamlined Snitch inst64 integration via the native rw_axi_rw_init_rw_obi variant (#88).
BurstLen parameter on the legalizer page splitter to configure the default burst size (#109).
Self-bootstrapping uv environment so make idma_hw_all runs with no manual venv setup (#128).
Mixed-traffic testbench for the rt_midend (#108); per-top trimmed vsim compile scripts (#116).

Changed

Register generation moved to SystemRDL/PeakRDL (APB) (#73).
morty replaced by the Bender pickle; output dir renamed to target/pickle/ (#110, #114).
Python tooling on uv/pyproject.toml; dropped requirements.txt and the setuptools pin (#111, #121, #122).
Bumped all bender dependencies to their latest releases (axi 0.39.9, common_cells 1.39.0, common_verification 0.2.5, register_interface 0.4.7, obi 0.1.7) (#130).
Reg frontend exposes a native APB4 slave (PeakRDL-native, matching desc64); the reg_to_apb shim is removed and register_interface is dropped as a dependency (#131). Breaking for reg_bus integrators.
CI: uv init, interruptible mirror jobs, bender git-db caching, push-CI on mainlines only, stale-runner-lock cleanup, branch-policy enforcement (#118, #120, #124, #125).

Fixed

Break a combinational loop in the desc64 speculation FIFO (#91, closes #71).
Multi-head datapath bugs (#123).
Error-handler valid/ready protocol violations.
Reg-block APB address-width match in the wrapper (clears port-width mismatches + spurious external-ack assertions introduced by the SystemRDL conversion); removed a duplicate tie-off and a dead hjson template.
Defensive default arm in the writes_in_flight TB case (#107); rt_midend choice-FIFO sync (#108).

Release checklist — must land on `devel` before this merges

deps: Bump all bender dependencies to latest #130 — bump all bender dependencies to latest. Merged to devel.
frontend: Expose the reg frontend as a native APB slave #131 — reg frontend → native APB slave, drop register_interface (breaking). Open, targets devel.
docs: Align Starlight site with generated hierarchy graphs #119 — quickstart/docs updated for the bare make idma_hw_all (self-provisioning uv) flow. Open.
Bump VERSION 0.6.5 → 0.7.0 and add the ## 0.7.0 CHANGELOG.md entry (mirrors the sections above). Not yet committed — done once scope below is final.
Remove stale tracked Bender.local axi pin — folded into deps: Bump all bender dependencies to latest #130.

Out of scope / slips to a follow-up (NOT in 0.7.0)

common_cells v2 — does not exist upstream (latest is 1.39.0, picked up by deps: Bump all bender dependencies to latest #130); Bump common_cells to v2 #99 stays quarantined as ecosystem-unresolvable.
On-the-fly compute / transpose engine — the Add on-the-fly compute support with a transpose engine #112/inst64: Drive on-the-fly transpose; add the snitch integration harness #113 work is not merged to devel; ships in a later release.

When all checklist items are on devel, flip this PR out of draft for the release review.

@v2

The @v2 tag resolves to a 2+ year-old release. Pin to v2.5.0 which bumps gitlab-ci Python to 3.12, switches to uv for deps, fixes riscv-gcc-install asset detection, and improves gitlab-ci logging.

Guard FSM state transitions on rsp_ready_i so rsp_valid_o is not deasserted before the handshake completes. Add missing eh_valid_i check in WAIT_LAST_W to prevent sampling garbage eh_i data.

* ci: Add branch policy enforcement and PR template Add devel as the staging branch for all external contributions: - retarget-to-devel.yml auto-retargets external PRs from master to devel via author_association check (OWNER/MEMBER/COLLABORATOR may still target master directly for promotion PRs). - promote-to-master.yml opens or extends a rolling devel->master PR when a maintainer applies the verified-internal label to a merged devel PR. - pull_request_template.md surfaces the policy in every new PR. - CONTRIBUTING.md documents the policy with rationale. Maintainers need to create the verified-internal label once (any color, description optional) before promote-to-master.yml fires. * ci: Tighten branch-policy workflows - retarget-to-devel: also fire on `edited` so a PR re-targeted to master post-open is still caught. - promote-to-master: swallow the 422 from `pulls.create` when two concurrent gitlab-ci runs race to open the promotion PR. - CONTRIBUTING: use ASCII `->` for greppability. * ci: Fix workflow_run trigger and tighten edge cases - promote-to-master: trigger on the parent `ci` workflow_run, not `gitlab-ci` (which is workflow_call-only and never produces its own workflow_run event). `ci` aggregates lint+build+gitlab-ci, so its success is a strictly stronger gate. - promote-to-master: narrow the 422 swallow to only the 'pull request already exists' validation error; rethrow other 422s (missing base, invalid head, no commits between, ...). - promote-to-master: add concurrency group to serialise rapid runs. - retarget-to-devel: add concurrency group keyed on PR number with cancel-in-progress, so rapid edited events don't spawn duplicate jobs. - retarget-to-devel: build the CONTRIBUTING.md link from `context.serverUrl` instead of a relative path, so it renders correctly in PR comments regardless of GitHub's markdown context. * ci: Dedupe retarget comment and survive comment failure - retarget-to-devel: embed a `` HTML marker in the comment body and check for it via paginated listComments before posting. Rapid `edited` events that cancel and re-fire the job no longer post duplicate comments. - retarget-to-devel: wrap createComment in try/catch with core.warning so a transient comment failure (after the retarget already succeeded) does not fail the workflow. * ci: Tighten promote-to-master permissions and fix org typo - promote-to-master: downgrade `contents: write` to `contents: read`. The script only calls `repos.compareCommits` (read) and PR APIs already covered by `pull-requests: write`. - CONTRIBUTING: fix pre-existing `pulp_platform` (underscore) to `pulp-platform` (hyphen) — the actual GitHub org slug. * ci: Tighten gitlint regex to forbid extra colons in subject Matches the single-colon rule enforced by util/lint-commits.py in CI. Now gitlint (run locally via pre-commit) catches the same failure mode that broke this PR's commit-msg lint.

) Track every arbiter handshake in the choice FIFO so the response demux stays in sync when internal and external events interleave, and add a mixed-traffic testbench (vsim + vcs, blocking) that catches the bug. Original fix: #96. Co-authored-by: Flavien Solt <flavien.solt97@gmail.com>

Use 'bender pickle' (slang frontend, bender >= 0.32.0) for all pickling: drops the sources.json indirection and the cf_math_pkg concat hack, and retires the morty HTML/DOT doc graphs and their CI installs. Pickle output stays at target/morty for downstream compatibility. Commits target/rtl/include/.gitkeep so bender 0.32.0 (which errors on a missing export_include_dir) resolves on a fresh tree.

Replace actions/setup-python + pip with astral-sh/setup-uv@v7 + 'uv pip install -r requirements.txt' across the analyze, build, deploy, docs and lint workflows, matching the nonfree GitLab CI which already uses uv. lint-python/lint-yaml keep setup-python (no requirements install).

Add a Starlight-based documentation site under doc/site with rendered architecture, backend, legalizer and system-integration diagrams.

Generate per-top trimmed vsim compile scripts so each testbench compiles only the sources it needs, speeding up targeted simulation.

Set interruptible at the default level of .gitlab-ci.yml so the project's auto_cancel_pending_pipelines cancels the in-flight devel pipeline (init + the idma trigger, which cascades to the nonfree child) when a newer commit lands. The nonfree CI side was marked interruptible in lockstep.

Rename the pickle output dir target/morty -> target/pickle now that morty is gone. The nonfree EDA scripts and CI were updated to read target/pickle in lockstep.

Regenerate the module hierarchy graphs from the bender-pickle syntax tree via util/ast2dot.py, replacing the retired morty DOT graphs. Restores the graph figures in the Sphinx docs and wires Graphviz back into the build/docs CI.

Replace requirements.txt with a pyproject.toml manifest + committed uv.lock, and run CI generation via 'uv run --locked' / 'uv sync --locked' on Python 3.11. Keeps setuptools<79 for regtool's pkg_resources. No RTL changes.

Replace the hjson/regtool register description with parameterized SystemRDL sources rendered by PeakRDL (regblock apb4-flat + raw-header svpkg). Migrates the build to peakrdl, drops the regtool reg-gen path. Co-authored-by: Michael Rogenmoser <michael@rogenmoser.us> Co-authored-by: Tim Fischer <fischeti@iis.ee.ethz.ch>

regtool's pkg_resources import forced setuptools<79; the SystemRDL/PeakRDL migration removed regtool, so the pin and the unused IDMA_REGTOOL/IDMA_REG_DIR make variables are dead weight.

* ci: Cache the bender git database keyed on the lockfile A composite action sets BENDER_DB_DIR to relocate the bare-repo database to a cacheable path; warm checkouts then need no network. The project- local .bender is never cached: its checkouts hold absolute-path alternates into the database and are rebuilt from it in under 0.1s. * ci: Add author header and drop verbose comment in bender-db-cache

Add optional multi-head (multi-channel) support to the iDMA backend. The read/write managers can be instantiated N-fold via a numeric prefix in the backend variant ID; a per-transfer src_head/dst_head selects the channel. Additive and opt-in: existing single-head variants regenerate byte-identical. Co-authored-by: Lud1ma <luedde.mahr@web.de> Co-authored-by: Thomas Benz <tbenz@iis.ee.ethz.ch>

The IIS shared runners intermittently wedge a build slot with a stale git lock under ${CI_PROJECT_DIR}.tmp (git-template/config or .gitlab-runner.ext.conf.lock), failing the public init job at git checkout before any job script runs. Clear those stale locks in a pre_get_sources_script, scoped to *.lock and git-template so the CA bundle in .tmp is preserved.

An open same-repo PR fires both push and pull_request, so the gitlab-ci job mirrors the same commit twice and (with auto_cancel_pending_pipelines) cancels the PR's child pipeline. Scope push to devel/master; feature branches validate via their PR (pull_request synchronize still reruns on every pushed commit). Also stops the deploy job creating __deploy__ refs on feature pushes.

#123) Fix the multi-head read/write datapath: gate the AR/AW meta channels on the address-channel head (ar_req_i/aw_req_i) rather than the datapath head, route the write datapath response by the FIFO-tracked dst_head, and emit the per-head tagged path when a protocol has more than one head. Add directed backend testbenches for the 2r_axi_w_axi (two read heads) and 2rw_axi (cross-head write) configurations, wired into nonfree CI.

Add a default arm to the writes_in_flight case in the backend testbench template that $fatal()s on an unhandled destination protocol, instead of silently falling through when a generated variant adds a new protocol.

…nitch (#88) Add a native rw_axi_rw_init_rw_obi backend variant for snitch_cluster and switch the inst64 frontend to drive it directly (OBI/INIT write path with address-map steering), replacing the plain rw_axi instantiation. Add the variant's directed job stimulus and the snitch_cluster gitignore entries.

Disable fall-through on i_speculation_fifo to break the combinational loop flush -> speculation_correct -> push -> flush. The fall-through path only exposed the same-cycle empty-FIFO output, which the speculation check never uses (a guess is confirmed only when its descriptor read returns), so registering the head is the correct behavior with no throughput impact. Verified by a new blocking Spyglass combinational-loop gate (spyglass-lint-desc64). Closes #71.

Expose the page-splitter's non-reduced burst cap as a BurstLen parameter (default 8 = the AXI 256-beat max, preserving existing behavior) threaded through the backend/synth/legalizer wrappers, so a system like pulp_cluster can configure smaller default bursts without touching the core.

PR #109's rebased NumStreams commit resurrected idma_reg.hjson.tpl (removed by the SystemRDL conversion in #73) and re-added a second gen_hw2reg_unused generate block in idma_reg.sv.tpl. The duplicate label breaks elaboration of the reg block on every variant (DC VER-288, VCS IPD, Questa vlog-2388), failing the devel pipeline. Keep only #109's BurstLen change.

The second gen_hw2reg_unused block (#109's resurrected NumStreams churn) duplicates the SystemRDL one in the same scope, breaking elaboration on every variant (DC VER-288, VCS IPD, Questa vlog-2388). Completes the devel-build fix started in 6f8130a (which only removed the dead hjson).

PeakRDL sizes idma_reg*_reg_top's s_apb_paddr to the regmap's minimal address width (8 for reg32_3d, 9 for reg64_*), but the wrapper connected the full 32-bit internal APB paddr, tripping vsim-3015 port-width mismatches in every gen_core_regs instance. Slice paddr to the generated IDMA_*_REG_TOP_MIN_ADDR_WIDTH (via a RegAddrWidth localparam).

Provision a local uv .venv (uv sync --locked) and prepend it to PATH when the generator deps are not already importable, so 'make idma_hw_all' works with no pre-activated venv. An activated venv or 'uv run make' (CI) is detected and left untouched. Lets a consumer call 'make -C $(bender path idma) idma_hw_all' directly. Requires uv on PATH.

Raise Bender.yml version floors to the latest releases (axi 0.39.9, common_cells 1.39.0, common_verification 0.2.5, register_interface 0.4.7, obi 0.1.7) and drop the stale Bender.local axi pin. No Bender.lock or generated-RTL drift.

DanielKellerM · 2026-06-17T12:30:13Z

Downstream iDMA 0.7.0 adoption follow-ups

Heads-up notices opened so consumers can plan for the breaking 0.7.0 bump. Follow up with dependency PRs once v0.7.0 is tagged:

Upcoming iDMA 0.7.0 snitch_cluster#323 — direct dep (idma: 0.6.5)
Upcoming iDMA 0.7.0 cheshire#278 — direct dep (iDMA 0.6.4; relevant on the gwaihir branch)
Upcoming iDMA 0.7.0 gwaihir#26 — transitive via snitch_cluster + cheshire

…eam (#112)

…133)

…136)

The generator-environment bootstrap in idma.mk tried three paths in order (system python with mako, a pre-existing .venv, then uv sync) and only the uv branch installs anything. On runners without a global mako or uv on PATH the ladder fell through to a hard error even though pyproject.toml + uv.lock fully describe the environment. Collapse it to a single approach: PYTHON, PEAKRDL and SPHINXBUILD all run via 'uv run --locked --project $(IDMA_ROOT)', matching the CI convention. uv provisions the locked environment on first use, so generation behaves identically locally, in CI and for downstream integrators. Also fixes the latent ordering bug where the old ladder referenced IDMA_ROOT before it was defined.

PR #112 added opt.compute to idma_req_t, but the desc64 stimulus class randomizes idma_req_t and zeroes every opt sub-field the descriptor format cannot express except compute. The golden model thus carried random compute values while the DUT (descriptors have no compute encoding) emitted zero, firing a Burst mismatch on every descriptor and turning the non-allow_failure desc64 vcs-sim / vsim-sim-cov jobs red. Constrain compute to zero, matching the existing beo/axi-param zeroing.

Two generation defects in the compute (#112) / multi-head (#136) tracks: - The idma_otf_compute .ComputeEnable parameter rendered a bare assignment pattern '{...}; Questa infers the type but DC Presto (VER-294) and Spyglass reject it. Type-prefix it with idma_pkg::compute_enable_t. - w_beat_done was a single scalar net bound by every write instance, so a backend with >1 write head drove it multiply (vsim-3839, multihead_rw). Vectorize it per write head like the other write-port nets; keep the scalar for the single-write-port case the compute engine consumes.

Three always_ff blocks gated their async-reset branch on a compound condition (!rst_ni || clear_i || exec_done), mixing the asynchronous reset with synchronous clears. Questa and Verilator tolerate it, but DC (ELAB-303, ELAB-300) and Spyglass reject it, failing dc-synth-compute. Test only !rst_ni in the async branch and move the synchronous clears to an else-if, preserving behavior (verified: transpose DPI regression still all-PASS).

Local L1-to-L1 cluster transfers are handled via a single OBI interface, where read and write transactions must be arbitrated. The previous implementation prioritized reads, potentially leading to a deadlock due to write starvation given the limited internal request buffer depth (hardcoded). Prioritizing writes instead enforces a strict read-write interleaving, since the DMA always reads from a source before writing to a destination.

DanielKellerM and others added 28 commits May 20, 2026 21:45

ci: Bump pulp-platform/pulp-actions from v2 to v2.5.0

1cb384d

The @v2 tag resolves to a 2+ year-old release. Pin to v2.5.0 which bumps gitlab-ci Python to 3.12, switches to uv for deps, fixes riscv-gcc-install asset detection, and improves gitlab-ci logging.

ci: Install Python deps via per-job venv in init stage

fedb3be

backend: Fix error handler valid/ready protocol violations

690adef

Guard FSM state transitions on rsp_ready_i so rsp_valid_o is not deasserted before the handshake completes. Add missing eh_valid_i check in WAIT_LAST_W to prevent sampling garbage eh_i data.

docs: Add Starlight documentation site with architecture diagrams (#103)

f2fe499

Add a Starlight-based documentation site under doc/site with rendered architecture, backend, legalizer and system-integration diagrams.

build: Add per-top trimmed vsim compile scripts (#116)

6e1aaa0

Generate per-top trimmed vsim compile scripts so each testbench compiles only the sources it needs, speeding up targeted simulation.

build: Rename the pickle output directory to target/pickle (#114)

acc384d

Rename the pickle output dir target/morty -> target/pickle now that morty is gone. The nonfree EDA scripts and CI were updated to read target/pickle in lockstep.

build: Drop setuptools pin and dead regtool make vars (#122)

3f063f7

regtool's pkg_resources import forced setuptools<79; the SystemRDL/PeakRDL migration removed regtool, so the pin and the unused IDMA_REGTOOL/IDMA_REG_DIR make variables are dead weight.

DanielKellerM mentioned this pull request Jun 16, 2026

frontend: Expose the reg frontend as a native APB slave #131

Open

deps: Bump all bender dependencies to latest (#130)

5808a55

Raise Bender.yml version floors to the latest releases (axi 0.39.9, common_cells 1.39.0, common_verification 0.2.5, register_interface 0.4.7, obi 0.1.7) and drop the stale Bender.local axi pin. No Bender.lock or generated-RTL drift.

This was referenced Jun 17, 2026

Upcoming iDMA 0.7.0 pulp-platform/gwaihir#26

Open

Upcoming iDMA 0.7.0 pulp-platform/snitch_cluster#323

Open

Upcoming iDMA 0.7.0 pulp-platform/cheshire#278

Open

Add on-the-fly compute support with a transpose engine #112

Merged

DanielKellerM and others added 9 commits June 17, 2026 17:16

Add on-the-fly compute support with a transpose engine at the write s…

2435af6

…eam (#112)

test: Harden transpose testbenches (fatal-on-fail + midend coverage) (#…

77b6a47

…133)

frontend: Gate external register read-ack (#134)

6be43a7

frontend: Emit synth-wrapper head ports only for multi-head backends (#…

210982a

…136)

DanielKellerM force-pushed the devel branch from 33baaa3 to 9ddc2d2 Compare June 22, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release: iDMA 0.7.0#129

release: iDMA 0.7.0#129
DanielKellerM wants to merge 38 commits into
masterfrom
devel

DanielKellerM commented Jun 16, 2026 •

edited

Loading

Uh oh!

DanielKellerM commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

DanielKellerM commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Highlights / breaking changes

Added

Changed

Fixed

Release checklist — must land on devel before this merges

Out of scope / slips to a follow-up (NOT in 0.7.0)

Uh oh!

DanielKellerM commented Jun 17, 2026

Downstream iDMA 0.7.0 adoption follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DanielKellerM commented Jun 16, 2026 •

edited

Loading

Release checklist — must land on `devel` before this merges