perf(query): gate the bounded-multikey count+take fast path by table size#233
Merged
Conversation
…size
The bounded_multikey_count_take_candidate gate triggers an eval-level
single-threaded scan that walks every row with an O(found) per-row
linear group lookup. Profitable on small inputs (skips the full DAG
group HT construction) but for large multi-key inputs the serial scan
loses to the parallel mk_par_v2 fused_group path.
ClickBench 10M q17 — `(select {c: (count UserID) by: {UserID,
SearchPhrase} take: 10})` over 10M rows with ~2.1M distinct composite
keys — used to land here and spend ~340 ms on the linear scan even
though the result only needs any 10 (UserID, SearchPhrase, count)
tuples.
Gate the candidate on `nrows < 100000` so big inputs fall through to
the parallel filtered_group multi-key path.
ClickBench 10M:
q17 ~354 → ~161 ms (-54%, -193ms)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bounded_multikey_count_take_candidatetriggers an eval-levelsingle-threaded scan in
ray_select_fnthat walks every row with anO(found)per-row linear group lookup. Profitable on small inputs(skips the full DAG group HT construction) but for large multi-key
inputs the serial scan loses to the parallel
mk_par_v2fused_grouppath.
ClickBench 10M q17 —
— 10M rows × ~2.1M distinct composite keys. The query landed in the
eval-level path and spent ~340 ms on the linear scan even though the
result only needs any 10 (UserID, SearchPhrase, count) tuples.
Gate the candidate on
nrows < 100000so big inputs fall through tothe parallel filtered_group multi-key path.
ClickBench 10M:
Full 43-query sum:
-207 ms / -6.7%.Tests: 3241/3243 pass (unchanged).