Skip to content

release: iDMA 0.7.0#129

Draft
DanielKellerM wants to merge 38 commits into
masterfrom
devel
Draft

release: iDMA 0.7.0#129
DanielKellerM wants to merge 38 commits into
masterfrom
devel

Conversation

@DanielKellerM

@DanielKellerM DanielKellerM commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Draft release PR — iDMA 0.7.0 (develmaster). Consolidates the 28 commits on devel. 0.7.0 is a breaking minor (0.x minor = breaking axis): the register interface and idma_pkg vocabulary change.

Highlights / breaking changes

  • Register blocks regenerated with SystemRDL (PeakRDL) over APB, replacing the reggen/reg_bus flow — the register frontend interface changes (#73).
  • Multi-head DMA — multiple read/write AXI channels selected per transfer via src_head/dst_head; adds idma_pkg types (#85, #123).
  • Build/toolchain: morty → Bender slang pickle, pickle output at target/pickle/, and Python deps managed by uv (pyproject.toml/uv.lock) instead of pip (#110, #114, #111, #121).

Added

  • Multi-head DMA capabilities + directed verification testbenches (#85, #123).
  • Starlight documentation site with generated architecture/hierarchy diagrams (#103, #117).
  • Streamlined Snitch inst64 integration via the native rw_axi_rw_init_rw_obi variant (#88).
  • BurstLen parameter on the legalizer page splitter to configure the default burst size (#109).
  • Self-bootstrapping uv environment so make idma_hw_all runs with no manual venv setup (#128).
  • Mixed-traffic testbench for the rt_midend (#108); per-top trimmed vsim compile scripts (#116).

Changed

  • Register generation moved to SystemRDL/PeakRDL (APB) (#73).
  • morty replaced by the Bender pickle; output dir renamed to target/pickle/ (#110, #114).
  • Python tooling on uv/pyproject.toml; dropped requirements.txt and the setuptools pin (#111, #121, #122).
  • Bumped all bender dependencies to their latest releases (axi 0.39.9, common_cells 1.39.0, common_verification 0.2.5, register_interface 0.4.7, obi 0.1.7) (#130).
  • Reg frontend exposes a native APB4 slave (PeakRDL-native, matching desc64); the reg_to_apb shim is removed and register_interface is dropped as a dependency (#131). Breaking for reg_bus integrators.
  • CI: uv init, interruptible mirror jobs, bender git-db caching, push-CI on mainlines only, stale-runner-lock cleanup, branch-policy enforcement (#118, #120, #124, #125).

Fixed

  • Break a combinational loop in the desc64 speculation FIFO (#91, closes #71).
  • Multi-head datapath bugs (#123).
  • Error-handler valid/ready protocol violations.
  • Reg-block APB address-width match in the wrapper (clears port-width mismatches + spurious external-ack assertions introduced by the SystemRDL conversion); removed a duplicate tie-off and a dead hjson template.
  • Defensive default arm in the writes_in_flight TB case (#107); rt_midend choice-FIFO sync (#108).

Release checklist — must land on devel before this merges

Out of scope / slips to a follow-up (NOT in 0.7.0)

When all checklist items are on devel, flip this PR out of draft for the release review.

DanielKellerM and others added 28 commits May 20, 2026 21:45
The @v2 tag resolves to a 2+ year-old release. Pin to v2.5.0 which
bumps gitlab-ci Python to 3.12, switches to uv for deps, fixes
riscv-gcc-install asset detection, and improves gitlab-ci logging.
Guard FSM state transitions on rsp_ready_i so rsp_valid_o is not
deasserted before the handshake completes. Add missing eh_valid_i
check in WAIT_LAST_W to prevent sampling garbage eh_i data.
* ci: Add branch policy enforcement and PR template

Add devel as the staging branch for all external contributions:

- retarget-to-devel.yml auto-retargets external PRs from master to
  devel via author_association check (OWNER/MEMBER/COLLABORATOR may
  still target master directly for promotion PRs).
- promote-to-master.yml opens or extends a rolling devel->master PR
  when a maintainer applies the verified-internal label to a merged
  devel PR.
- pull_request_template.md surfaces the policy in every new PR.
- CONTRIBUTING.md documents the policy with rationale.

Maintainers need to create the verified-internal label once
(any color, description optional) before promote-to-master.yml fires.

* ci: Tighten branch-policy workflows

- retarget-to-devel: also fire on `edited` so a PR re-targeted to
  master post-open is still caught.
- promote-to-master: swallow the 422 from `pulls.create` when two
  concurrent gitlab-ci runs race to open the promotion PR.
- CONTRIBUTING: use ASCII `->` for greppability.

* ci: Fix workflow_run trigger and tighten edge cases

- promote-to-master: trigger on the parent `ci` workflow_run, not
  `gitlab-ci` (which is workflow_call-only and never produces its own
  workflow_run event). `ci` aggregates lint+build+gitlab-ci, so its
  success is a strictly stronger gate.
- promote-to-master: narrow the 422 swallow to only the 'pull request
  already exists' validation error; rethrow other 422s (missing base,
  invalid head, no commits between, ...).
- promote-to-master: add concurrency group to serialise rapid runs.
- retarget-to-devel: add concurrency group keyed on PR number with
  cancel-in-progress, so rapid edited events don't spawn duplicate
  jobs.
- retarget-to-devel: build the CONTRIBUTING.md link from
  `context.serverUrl` instead of a relative path, so it renders
  correctly in PR comments regardless of GitHub's markdown context.

* ci: Dedupe retarget comment and survive comment failure

- retarget-to-devel: embed a `<!-- retarget-to-devel -->` HTML marker
  in the comment body and check for it via paginated listComments
  before posting. Rapid `edited` events that cancel and re-fire the
  job no longer post duplicate comments.
- retarget-to-devel: wrap createComment in try/catch with core.warning
  so a transient comment failure (after the retarget already
  succeeded) does not fail the workflow.

* ci: Tighten promote-to-master permissions and fix org typo

- promote-to-master: downgrade `contents: write` to `contents: read`.
  The script only calls `repos.compareCommits` (read) and PR APIs
  already covered by `pull-requests: write`.
- CONTRIBUTING: fix pre-existing `pulp_platform` (underscore) to
  `pulp-platform` (hyphen) — the actual GitHub org slug.

* ci: Tighten gitlint regex to forbid extra colons in subject

Matches the single-colon rule enforced by util/lint-commits.py in CI.
Now gitlint (run locally via pre-commit) catches the same failure
mode that broke this PR's commit-msg lint.
)

Track every arbiter handshake in the choice FIFO so the response demux
stays in sync when internal and external events interleave, and add a
mixed-traffic testbench (vsim + vcs, blocking) that catches the bug.

Original fix: #96.

Co-authored-by: Flavien Solt <flavien.solt97@gmail.com>
Use 'bender pickle' (slang frontend, bender >= 0.32.0) for all pickling:
drops the sources.json indirection and the cf_math_pkg concat hack, and
retires the morty HTML/DOT doc graphs and their CI installs. Pickle output
stays at target/morty for downstream compatibility. Commits target/rtl/include/.gitkeep
so bender 0.32.0 (which errors on a missing export_include_dir) resolves on a fresh tree.
Replace actions/setup-python + pip with astral-sh/setup-uv@v7 +
'uv pip install -r requirements.txt' across the analyze, build, deploy,
docs and lint workflows, matching the nonfree GitLab CI which already
uses uv. lint-python/lint-yaml keep setup-python (no requirements install).
Add a Starlight-based documentation site under doc/site with rendered
architecture, backend, legalizer and system-integration diagrams.
Generate per-top trimmed vsim compile scripts so each testbench compiles only the sources it needs, speeding up targeted simulation.
Set interruptible at the default level of .gitlab-ci.yml so the project's auto_cancel_pending_pipelines cancels the in-flight devel pipeline (init + the idma trigger, which cascades to the nonfree child) when a newer commit lands. The nonfree CI side was marked interruptible in lockstep.
Rename the pickle output dir target/morty -> target/pickle now that morty is gone. The nonfree EDA scripts and CI were updated to read target/pickle in lockstep.
Regenerate the module hierarchy graphs from the bender-pickle syntax tree via util/ast2dot.py, replacing the retired morty DOT graphs. Restores the graph figures in the Sphinx docs and wires Graphviz back into the build/docs CI.
Replace requirements.txt with a pyproject.toml manifest + committed uv.lock, and run CI generation via 'uv run --locked' / 'uv sync --locked' on Python 3.11. Keeps setuptools<79 for regtool's pkg_resources. No RTL changes.
Replace the hjson/regtool register description with parameterized
SystemRDL sources rendered by PeakRDL (regblock apb4-flat + raw-header
svpkg). Migrates the build to peakrdl, drops the regtool reg-gen path.

Co-authored-by: Michael Rogenmoser <michael@rogenmoser.us>
Co-authored-by: Tim Fischer <fischeti@iis.ee.ethz.ch>
regtool's pkg_resources import forced setuptools<79; the SystemRDL/PeakRDL migration removed regtool, so the pin and the unused IDMA_REGTOOL/IDMA_REG_DIR make variables are dead weight.
* ci: Cache the bender git database keyed on the lockfile

A composite action sets BENDER_DB_DIR to relocate the bare-repo database
to a cacheable path; warm checkouts then need no network. The project-
local .bender is never cached: its checkouts hold absolute-path
alternates into the database and are rebuilt from it in under 0.1s.

* ci: Add author header and drop verbose comment in bender-db-cache
Add optional multi-head (multi-channel) support to the iDMA backend. The
read/write managers can be instantiated N-fold via a numeric prefix in the
backend variant ID; a per-transfer src_head/dst_head selects the channel.
Additive and opt-in: existing single-head variants regenerate byte-identical.

Co-authored-by: Lud1ma <luedde.mahr@web.de>
Co-authored-by: Thomas Benz <tbenz@iis.ee.ethz.ch>
The IIS shared runners intermittently wedge a build slot with a stale git lock under ${CI_PROJECT_DIR}.tmp (git-template/config or .gitlab-runner.ext.conf.lock), failing the public init job at git checkout before any job script runs. Clear those stale locks in a pre_get_sources_script, scoped to *.lock and git-template so the CA bundle in .tmp is preserved.
An open same-repo PR fires both push and pull_request, so the gitlab-ci job mirrors the same commit twice and (with auto_cancel_pending_pipelines) cancels the PR's child pipeline. Scope push to devel/master; feature branches validate via their PR (pull_request synchronize still reruns on every pushed commit). Also stops the deploy job creating __deploy__ refs on feature pushes.
#123)

Fix the multi-head read/write datapath: gate the AR/AW meta channels on the
address-channel head (ar_req_i/aw_req_i) rather than the datapath head, route
the write datapath response by the FIFO-tracked dst_head, and emit the per-head
tagged path when a protocol has more than one head.

Add directed backend testbenches for the 2r_axi_w_axi (two read heads) and
2rw_axi (cross-head write) configurations, wired into nonfree CI.
Add a default arm to the writes_in_flight case in the backend testbench
template that $fatal()s on an unhandled destination protocol, instead of
silently falling through when a generated variant adds a new protocol.
…nitch (#88)

Add a native rw_axi_rw_init_rw_obi backend variant for snitch_cluster and
switch the inst64 frontend to drive it directly (OBI/INIT write path with
address-map steering), replacing the plain rw_axi instantiation. Add the
variant's directed job stimulus and the snitch_cluster gitignore entries.
Disable fall-through on i_speculation_fifo to break the combinational
loop flush -> speculation_correct -> push -> flush. The fall-through path
only exposed the same-cycle empty-FIFO output, which the speculation check
never uses (a guess is confirmed only when its descriptor read returns), so
registering the head is the correct behavior with no throughput impact.

Verified by a new blocking Spyglass combinational-loop gate (spyglass-lint-desc64).

Closes #71.
Expose the page-splitter's non-reduced burst cap as a BurstLen parameter
(default 8 = the AXI 256-beat max, preserving existing behavior) threaded
through the backend/synth/legalizer wrappers, so a system like pulp_cluster
can configure smaller default bursts without touching the core.
PR #109's rebased NumStreams commit resurrected idma_reg.hjson.tpl
(removed by the SystemRDL conversion in #73) and re-added a second
gen_hw2reg_unused generate block in idma_reg.sv.tpl. The duplicate label
breaks elaboration of the reg block on every variant (DC VER-288, VCS
IPD, Questa vlog-2388), failing the devel pipeline. Keep only #109's
BurstLen change.
The second gen_hw2reg_unused block (#109's resurrected NumStreams churn)
duplicates the SystemRDL one in the same scope, breaking elaboration on
every variant (DC VER-288, VCS IPD, Questa vlog-2388). Completes the
devel-build fix started in 6f8130a (which only removed the dead hjson).
PeakRDL sizes idma_reg*_reg_top's s_apb_paddr to the regmap's minimal
address width (8 for reg32_3d, 9 for reg64_*), but the wrapper connected
the full 32-bit internal APB paddr, tripping vsim-3015 port-width
mismatches in every gen_core_regs instance. Slice paddr to the generated
IDMA_*_REG_TOP_MIN_ADDR_WIDTH (via a RegAddrWidth localparam).
Provision a local uv .venv (uv sync --locked) and prepend it to PATH when
the generator deps are not already importable, so 'make idma_hw_all' works
with no pre-activated venv. An activated venv or 'uv run make' (CI) is
detected and left untouched. Lets a consumer call
'make -C $(bender path idma) idma_hw_all' directly. Requires uv on PATH.
Raise Bender.yml version floors to the latest releases (axi 0.39.9, common_cells 1.39.0, common_verification 0.2.5, register_interface 0.4.7, obi 0.1.7) and drop the stale Bender.local axi pin. No Bender.lock or generated-RTL drift.
@DanielKellerM

Copy link
Copy Markdown
Collaborator Author

Downstream iDMA 0.7.0 adoption follow-ups

Heads-up notices opened so consumers can plan for the breaking 0.7.0 bump. Follow up with dependency PRs once v0.7.0 is tagged:

DanielKellerM and others added 9 commits June 17, 2026 17:16
The generator-environment bootstrap in idma.mk tried three paths in order
(system python with mako, a pre-existing .venv, then uv sync) and only the
uv branch installs anything. On runners without a global mako or uv on PATH
the ladder fell through to a hard error even though pyproject.toml + uv.lock
fully describe the environment.

Collapse it to a single approach: PYTHON, PEAKRDL and SPHINXBUILD all run via
'uv run --locked --project $(IDMA_ROOT)', matching the CI convention. uv
provisions the locked environment on first use, so generation behaves
identically locally, in CI and for downstream integrators. Also fixes the
latent ordering bug where the old ladder referenced IDMA_ROOT before it was
defined.
PR #112 added opt.compute to idma_req_t, but the desc64 stimulus class
randomizes idma_req_t and zeroes every opt sub-field the descriptor format
cannot express except compute. The golden model thus carried random compute
values while the DUT (descriptors have no compute encoding) emitted zero,
firing a Burst mismatch on every descriptor and turning the non-allow_failure
desc64 vcs-sim / vsim-sim-cov jobs red. Constrain compute to zero, matching
the existing beo/axi-param zeroing.
Two generation defects in the compute (#112) / multi-head (#136) tracks:

- The idma_otf_compute .ComputeEnable parameter rendered a bare assignment
  pattern '{...}; Questa infers the type but DC Presto (VER-294) and Spyglass
  reject it. Type-prefix it with idma_pkg::compute_enable_t.

- w_beat_done was a single scalar net bound by every write instance, so a
  backend with >1 write head drove it multiply (vsim-3839, multihead_rw).
  Vectorize it per write head like the other write-port nets; keep the scalar
  for the single-write-port case the compute engine consumes.
Three always_ff blocks gated their async-reset branch on a compound
condition (!rst_ni || clear_i || exec_done), mixing the asynchronous reset
with synchronous clears. Questa and Verilator tolerate it, but DC (ELAB-303,
ELAB-300) and Spyglass reject it, failing dc-synth-compute. Test only !rst_ni
in the async branch and move the synchronous clears to an else-if, preserving
behavior (verified: transpose DPI regression still all-PASS).
Local L1-to-L1 cluster transfers are handled via a single OBI interface, where read and write transactions must be arbitrated.
The previous implementation prioritized reads, potentially leading to a deadlock due to write starvation given the limited internal request buffer depth (hardcoded).
Prioritizing writes instead enforces a strict read-write interleaving, since the DMA always reads from a source before writing to a destination.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

possible combinatorial loop in idma_desc64_ar_gen_prefetch.sv

5 participants