Skip to content

[SPARK-57708][4.1][INFRA] Backport CI precompile artifact sharing and Coursier cache unification#56798

Open
gaogaotiantian wants to merge 1 commit into
apache:branch-4.1from
gaogaotiantian:precompile-backport-4.1
Open

[SPARK-57708][4.1][INFRA] Backport CI precompile artifact sharing and Coursier cache unification#56798
gaogaotiantian wants to merge 1 commit into
apache:branch-4.1from
gaogaotiantian:precompile-backport-4.1

Conversation

@gaogaotiantian

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This backports the CI build-time optimization series from branch-4.x/master to branch-4.1. A precompile job builds Spark once and publishes the compile output as an artifact that the downstream matrix jobs consume (falling back to a local build if the precompile job is absent or fails), the per-job Coursier caches are unified under a single key, and the shared compile artifacts use zstd compression. Squashed backport of:

  • [SPARK-56768] Share SBT compile artifact across pyspark CI jobs
  • [SPARK-56831] Share SBT precompile artifact with sparkr CI job
  • [SPARK-56943] Share SBT precompile artifact with JVM build matrix
  • [SPARK-56964] Share Maven precompile artifact across maven_test matrix
  • [SPARK-57069] Share SBT precompile artifact with docker/k8s integration test CI jobs
  • [SPARK-57075] Share precompile Coursier cache with host-runner SBT jobs
  • [SPARK-57142] Share SBT precompile artifact with tpcds-1g CI job
  • [SPARK-57144] Unify Coursier cache to a single key across all jobs
  • [SPARK-56830] Share SBT compile artifact with python hosted runner CI jobs
  • [SPARK-57330] Switch shared CI compile artifacts to zstd compression

Adaptations for branch-4.1: the Python toolchain stays on 3.11 with the branch's existing package pins, and the GitHub Actions are kept at the versions already pinned on branch-4.1 (actions/cache@v4, actions/cache/restore@v4, actions/checkout@v4, actions/setup-java@v4, actions/download-artifact@v4, actions/upload-artifact@v4) rather than pulling in the unrelated action version bumps. As on branch-4.x, the precompile job is the sole Coursier cache writer and all consumer jobs are restore-only.

Why are the changes needed?

To cut redundant Scala/Maven compilation and Coursier cache duplication on branch-4.1 CI, matching the optimization already present on the newer branches.

Does this PR introduce any user-facing change?

No. CI-only.

How was this patch tested?

CI on this PR. The three workflow files validate with python3 -c "import yaml; yaml.safe_load(...)".

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-8)

… Coursier cache unification

### What changes were proposed in this pull request?

This backports the CI build-time optimization series from `branch-4.x`/`master`
to `branch-4.1`. A `precompile` job builds Spark once and publishes the compile
output as an artifact that the downstream matrix jobs consume (falling back to a
local build if the precompile job is absent or fails), the per-job Coursier
caches are unified under a single key, and the shared compile artifacts use zstd
compression. Squashed backport of:

- [SPARK-56768] Share SBT compile artifact across pyspark CI jobs
- [SPARK-56831] Share SBT precompile artifact with sparkr CI job
- [SPARK-56943] Share SBT precompile artifact with JVM build matrix
- [SPARK-56964] Share Maven precompile artifact across maven_test matrix
- [SPARK-57069] Share SBT precompile artifact with docker/k8s integration test CI jobs
- [SPARK-57075] Share precompile Coursier cache with host-runner SBT jobs
- [SPARK-57142] Share SBT precompile artifact with tpcds-1g CI job
- [SPARK-57144] Unify Coursier cache to a single key across all jobs
- [SPARK-56830] Share SBT compile artifact with python hosted runner CI jobs
- [SPARK-57330] Switch shared CI compile artifacts to zstd compression

Adaptations for `branch-4.1`: the Python toolchain stays on 3.11 with the
branch's existing package pins, and the GitHub Actions are kept at the versions
already pinned on `branch-4.1` (`actions/cache@v4`, `actions/cache/restore@v4`,
`actions/checkout@v4`, `actions/setup-java@v4`, `actions/download-artifact@v4`,
`actions/upload-artifact@v4`) rather than pulling in the unrelated action
version bumps. As on `branch-4.x`, the `precompile` job is the sole Coursier
cache writer and all consumer jobs restore-only.

### Why are the changes needed?

To cut redundant Scala/Maven compilation and Coursier cache duplication on
`branch-4.1` CI, matching the optimization already present on the newer
branches.

### Does this PR introduce _any_ user-facing change?

No. CI-only.

### How was this patch tested?

CI on this PR. The three workflow files validate with
`python3 -c "import yaml; yaml.safe_load(...)"`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-8)
@gaogaotiantian

Copy link
Copy Markdown
Contributor Author

@zhengruifeng as the original author for this feature. I took a quick look and everything seems okay. We don't really use python_hosted_runner_test.yml in branch-4.1 but it's no harm to backport it - this is CI only anyway.

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gaogaotiantian and @HyukjinKwon!

uses: actions/cache@v4
with:
path: ~/.cache/coursier
key: coursier-${{ runner.os }}-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note: the precompile writer Coursier key (coursier-${{ runner.os }}-) and the consumer build-job key (${{ runner.os }}-coursier-) no longer share a prefix, so the consumer would miss the precompile-warmed cache. Real drift from branch-4.x (which keeps coursier-${{ runner.os }}- on both). The workflow is macOS-only, so the writer never runs — but align to coursier-${{ runner.os }}- for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants