Skip to content

bazel: bump bazel-orfs to 05bbceb for cacheable gnumake#10487

Open
openroad-ci wants to merge 1 commit into
The-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:jl/bump-bazel-orfs-gnumake-cc_binary
Open

bazel: bump bazel-orfs to 05bbceb for cacheable gnumake#10487
openroad-ci wants to merge 1 commit into
The-OpenROAD-Project:masterfrom
The-OpenROAD-Project-staging:jl/bump-bazel-orfs-gnumake-cc_binary

Conversation

@openroad-ci
Copy link
Copy Markdown
Collaborator

@openroad-ci openroad-ci commented May 21, 2026

Summary

  • Bump BAZEL_ORFS_COMMIT from 78f19f25 (bazel-orfs PR #699 era) to 05bbceb (bazel-orfs PR #717 final).
  • Picks up upstream's refactor of @gnumake from a Zig + configure bootstrap inside repository_ctx.execute to a plain cc_binary action built with the registered @llvm_toolchain clang.
  • Removes the last non-cacheable repo-rule execution in our Bazel graph.

Why now — CI benefits

Old pin built GNU Make inside the repo rule via zig cc, gated by a 300s configure timeout. Repo-rule work is outside the Bazel action graph, so it never hits the disk/remote action cache and re-runs every fresh output base (i.e. every fresh CI pod).

Under load — 16 parallel jobs on the 16-CPU CI pod with a cold Zig cache — the configure step exceeded 300s and failed the build.

We hit this exact failure mode in Jenkins build PR-10338-merge #21 (Bazel Test stage):

INFO: repository @@bazel-orfs++orfs_repositories+gnumake used the following cache hits instead of downloading the corresponding file.
 * Hash 'dd16fb1d...' for https://ftp.gnu.org/gnu/make/make-4.4.1.tar.gz
 * Hash 'c7ae866b...' for https://ziglang.org/download/0.12.0/zig-linux-x86_64-0.12.0.tar.xz
ERROR: external/bazel-orfs+/gnumake.bzl:139:13: An error occurred during the fetch of repository 'bazel-orfs++orfs_repositories+gnumake':
  ...
Error in fail: GNU Make configure failed:
stderr: Timed out; also encountered an error while attempting to retrieve output

Note that the asset-cache --repository_cache downloads succeeded — the failure is in repository_ctx.execute() running ./configure on the extracted source, not in any network fetch. The retry with --keep_going passed because the partial Zig cache from the first attempt was warm.

After the bump, GNU Make builds as a normal cc_binary action. First build populates the remote action cache; every subsequent fresh pod hits the cache instead of re-running the upstream build. Deterministic on a fixed host via -Wl,--build-id=none -Wl,-s.

What else the bump picks up (78f19f25..05bbceb)

Commit Subject OpenROAD impact
b10bca4 private/stages.bzl: SYNTH_NUM_PARTITIONS private/ — none
f37bae8 synth_partition.sh: fix SYNTH_SKIP_KEEP may shift mock-array synth
1ac6af9 synth_partition.sh: parse kept_modules.json may shift mock-array synth
5c89e35 orfs_design: forward user_arguments additive
6f3719b bazel-orfs internal ORFS/openroad pin bump none — OpenROAD overrides ORFS via archive_override, OpenROAD root
3e1dc5e archive_override support for ORFS additive
f1e817f bump_test fixture test-only
943cae5 gnumake: drop zig, use cc_binary (PR #717) the fix
8e8ade8 gnumake: hzeller review fold the fix
05bbceb gnumake: --override_repository docs docs

Real risk surface = synth_partition.sh tweaks (could shift MockArray_4x4_flat_test gold). Watch that test in CI.

Test plan

  • Bazel Test stage completes without @@bazel-orfs++orfs_repositories+gnumake fetch timeout.
  • //test/orfs/mock-array:MockArray_4x4_flat_test and siblings still pass.
  • Subsequent CI builds show @gnumake//:make as a remote action cache hit.
  • Full ubuntu:24.04 unit + tcl flow matrix.

Pin moved from 78f19f25 (PR The-OpenROAD-Project#699 era) to 05bbceb (PR The-OpenROAD-Project#717 final).
At the old pin, @gnumake's repo rule downloaded the Zig toolchain
(~80 MB) and bootstrapped GNU Make 4.4.1 inside repository_ctx.execute
via `zig cc -static -target x86_64-linux-musl`, gated by a 300s
configure timeout.

That setup put GNU Make's build outside Bazel's action graph, so it
was not cacheable in the disk/remote action cache and re-ran on every
fresh output base. Under load -- 16 parallel jobs on a 16-CPU CI pod
with a cold Zig cache -- the configure step occasionally exceeded the
300s timeout. Jenkins build PR-10338-merge The-OpenROAD-Project#21 hit exactly this
failure mode in the Bazel Test stage; the retry with --keep_going
passed because the partial Zig cache from the first attempt was warm.
https://jenkins.openroad.tools/job/OpenROAD-Public/job/PR-10338-merge/21/

PR The-OpenROAD-Project#717 (merged 2026-05-10) refactored the rule on hzeller's review:
download GNU Make sources, overlay a vendored config.h and a cc_binary
BUILD file, and let the registered @llvm_toolchain clang build make as
a normal cc_binary action. No repository_ctx.execute, no Zig, no
timeout. Determinism on a fixed host preserved via
`-Wl,--build-id=none -Wl,-s`.

Net for CI: the gmake binary becomes a cacheable Bazel action -- first
build populates the remote action cache, every subsequent fresh pod
hits the cache instead of re-running the upstream build. Closes the
last non-cacheable repo-rule execution in our Bazel graph (verified:
all other repo rules either download prebuilt blobs or push work into
actions, e.g. aspect_rules_js npm extracts).

Commits between old and new pin that the bump also picks up:
  b10bca4 private/stages.bzl: register SYNTH_NUM_PARTITIONS
  f37bae8 synth_partition.sh: fix SYNTH_SKIP_KEEP truthy check
  1ac6af9 synth_partition.sh: parse kept_modules.json without greedy sed
  5c89e35 orfs_design: forward user_arguments to orfs_flow
  6f3719b deps: bump orfs to 523bbbb4, openroad to 07427ed
  3e1dc5e feat: support archive_override for ORFS pin
  f1e817f fix: add missing self-archive fixture to bump_test data
  943cae5 gnumake: drop zig bootstrap, build with cc_binary
  8e8ade8 gnumake: address hzeller PR The-OpenROAD-Project#717 review
  05bbceb gnumake: document --override_repository host-make escape hatch

ORFS and OpenROAD pin bumps inside bazel-orfs do not propagate to this
repo: OpenROAD is the root module and overrides ORFS via
archive_override (ORFS_COMMIT=10a2baea), so bazel-orfs's internal pins
are irrelevant. Real risk surface is the synth_partition.sh tweaks
(could shift mock-array gold). Watch CI for diffs in
MockArray_4x4_flat_test.

Signed-off-by: Joao Luis Sombrio <sombrio@sombrasoft.dev>
@openroad-ci openroad-ci requested a review from a team as a code owner May 21, 2026 23:55
@openroad-ci openroad-ci requested a review from precisionmoon May 21, 2026 23:55
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the BAZEL_ORFS_COMMIT hash in MODULE.bazel and synchronizes the MODULE.bazel.lock file by updating the bzlTransitiveDigest. I have no feedback to provide as there were no review comments.

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@sombraSoft
Copy link
Copy Markdown
Contributor

@mliberty FIY

@sombraSoft
Copy link
Copy Markdown
Contributor

@oharboe @hzeller — flagging you both since this bumps over your gnumake cc_binary refactor (bazel-orfs#717). hzeller's prediction in the #699 review that zig was too heavy played out: we hit a 300s configure timeout on Jenkins because the repo rule's zig cc step ran on cold cache under load. After this bump GNU Make builds as a normal cc_binary action → cacheable → 540s single-shot vs 1334s with retry on the old pin.

Appreciate any sanity check on the picked-up synth_partition.sh changes against our mock-array tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants