Add eager group join strategy#22058
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@Dandandan Could you please help me run the benchmarks here? |
|
run benchmark tpch tpch10 tpcds |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eager-strategy (30075d5) to 2f2fe8f (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eager-strategy (30075d5) to 2f2fe8f (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eager-strategy (30075d5) to 2f2fe8f (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
This adds an eager right-side aggregation strategy for GroupJoin. The optimizer selects this strategy when row-count estimates suggest it is cheaper to aggregate the right side up front, then probe those precomputed aggregate results from the left side.
The eager strategy triggers in two cases:
Left join: right_rows > 1.5 * left_rows
Inner join: right_rows > 2 * left_rows
Execution aggregates the right side by join key first, evaluates aggregate accumulators once, then uses indexed lookups while scanning left-side batches. This avoids repeated accumulator updates during the probe phase and keeps left-side output pipelined.