Skip to content

[SPARK-57714][CORE] Add loser tree based k-way merger for UnsafeExternalSorter spill merge#56804

Open
s-unscrupulous wants to merge 1 commit into
apache:masterfrom
s-unscrupulous:SPARK-57714
Open

[SPARK-57714][CORE] Add loser tree based k-way merger for UnsafeExternalSorter spill merge#56804
s-unscrupulous wants to merge 1 commit into
apache:masterfrom
s-unscrupulous:SPARK-57714

Conversation

@s-unscrupulous

@s-unscrupulous s-unscrupulous commented Jun 26, 2026

Copy link
Copy Markdown

What changes were proposed in this pull request?

Add a loser tree based k-way merger as an opt-in alternative to the
existing priority-queue (binary heap) based UnsafeSorterSpillMerger
used by UnsafeExternalSorter to merge spill files.

The new merger is gated behind config
spark.unsafe.sorter.spill.merger.useLoserTree, defaulting to false
so behavior is unchanged unless explicitly enabled.

Why are the changes needed?

When UnsafeExternalSorter produces many spill files, the merge phase
is dominated by comparator cost. A loser tree performs one comparison
and one loser-slot read per level on each replay, versus two
comparisons plus two child reads for heap sift-down. For k-way merges
this roughly halves both the number of comparisons and the array
touches per popped record, improving end-to-end sort latency for
spill-heavy workloads.

Microbenchmark on Apple M4 / JDK 17 over 500K records per run shows
1.4x to 2.9x speedup across prefix-collision levels and run counts
from 4 to 256. Full results in core/benchmarks/SpillMergerBenchmark-results.txt.

runs low-collision medium-collision high-collision
4 1.5x 1.9x 1.4x
8 1.8x 2.3x 1.6x
16 2.2x 2.5x 1.7x
64 2.9x 2.0x 1.8x
256 2.6x 1.8x 1.8x

Concrete numbers at the most spill-heavy point (runs=256, 500K records/run, low collision):

  • priority-queue: best 19406ms, avg 20651ms
  • loser-tree: best 7561ms, avg 7568ms

Does this PR introduce any user-facing change?

No. The new behavior is gated behind an internal config defaulted to
false. Existing behavior is unchanged.

How was this patch tested?

  • New unit test suite UnsafeLoserTreeSpillMergerSuite (12 tests, all pass)
  • Extended UnsafeExternalSorterSuite to cover the loser tree path
  • New SpillMergerBenchmark for performance comparison; results
    committed under core/benchmarks/
  • Ran ./dev/scalastyle and ./dev/lint-java locally — both pass

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

@s-unscrupulous s-unscrupulous force-pushed the SPARK-57714 branch 2 times, most recently from a9dbcb1 to c50a765 Compare June 26, 2026 06:45
"in UnsafeExternalSorter; otherwise use the original priority-queue based merger. " +
"Loser tree performs one comparison per replay versus two for heap sift-down, which " +
"can be faster when there are many spills.")
.version("4.1.0")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.version("4.1.0")
.version("4.3.0")

…nalSorter spill merge

### What changes were proposed in this pull request?

Add a loser tree based k-way merger as an opt-in alternative to the
existing priority-queue (binary heap) based `UnsafeSorterSpillMerger`
used by `UnsafeExternalSorter` to merge spill files.

The new merger is gated behind config
`spark.unsafe.sorter.spill.merger.useLoserTree`, defaulting to `false`
so behavior is unchanged unless explicitly enabled.

### Why are the changes needed?

When `UnsafeExternalSorter` produces many spill files, the merge phase
is dominated by comparator cost. A loser tree performs one comparison
and one loser-slot read per level on each replay, versus two
comparisons plus two child reads for heap sift-down. For k-way merges
this roughly halves both the number of comparisons and the array
touches per popped record, improving end-to-end sort latency for
spill-heavy workloads.

Microbenchmark on Apple M4 / JDK 17 over 500K records per run shows
**1.4x to 2.9x speedup** across prefix-collision levels and run counts
from 4 to 256. See `core/benchmarks/SpillMergerBenchmark-results.txt`.

| runs | low-collision | medium-collision | high-collision |
| ---: | :-----------: | :--------------: | :------------: |
|   4  |     1.5x      |       1.9x       |      1.4x      |
|   8  |     1.8x      |       2.3x       |      1.6x      |
|  16  |     2.2x      |       2.5x       |      1.7x      |
|  64  |     2.9x      |       2.0x       |      1.8x      |
| 256  |     2.6x      |       1.8x       |      1.8x      |

### Does this PR introduce _any_ user-facing change?

No. The new behavior is gated behind an internal config defaulted to
`false`. Existing behavior is unchanged.

### How was this patch tested?

- New unit test suite `UnsafeLoserTreeSpillMergerSuite` (12 tests, all pass)
- Extended `UnsafeExternalSorterSuite` to cover the loser tree path
- New `SpillMergerBenchmark` for performance comparison; results
  included under `core/benchmarks/`
- Ran `./dev/scalastyle` and `./dev/lint-java` locally — both pass

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants