test: make by-group assertions order-independent (fix coverage-build flakiness)#234
Merged
Merged
Conversation
pearson_corr.rfl and per_group_holistic.rfl pinned a specific by-group emit position; group order is hash-bucket order (unspecified, like SQL/DuckDB) and flips between ASan and coverage builds. Assert order-independently (min/max, sorted set). More candidates swept in follow-up commits.
A by-group select returns groups in hash-bucket order, which is unspecified
(like SQL / DuckDB GROUP BY without ORDER BY) and legitimately differs between
build configs (debug+ASan vs coverage clang -O0) and runs. Many tests pinned a
specific group's value at a fixed result index, so they passed under one
ordering and failed under another — a `make coverage` run surfaced two such
failures (pearson_corr, per_group_holistic) that pass under debug+ASan.
Swept every position-pinned by-group assertion (537 candidates → 112 after
excluding single-group `where:` filters and `asc:`/`desc:`/`take:` ordered
queries → the genuinely fragile multi-group ones). Each was either confirmed
SAFE (single group, deterministic ordering, or identical value across groups)
or rewritten order-independently while preserving the canonically-correct
value computed from the data:
- pin the intended group with `where: (== key val)` so index 0 is determinate;
- or assert the SORTED column / multiset, e.g. `(asc (at sel 'm)) -- [...]`,
`(asc (raze ...))` for top/bot LIST cells.
Verified under BOTH group orderings: full suite green under gcc debug+ASan
(3244/3246, 0 failed) and clang coverage (per-file). The engine is unchanged —
group order was never contractual; this is purely test hygiene.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A
make coveragerun failed two tests —agg/pearson_corr.rflandagg/per_group_holistic.rfl— that pass cleanly undermake test(debug + ASan). Both pinned a specific group's value at a fixed result index of a by-groupselect.A by-group select returns groups in the hash table's bucket order, which is unspecified — exactly like SQL / DuckDB
GROUP BYwithoutORDER BY(DuckDB's parallel hash aggregate gives no order guarantee either). That order legitimately differs between build configs (debug+ASan vs coverageclang -O0, different memory layout) and across runs, so a position-pinned assertion passes under one ordering and fails under another. The engine is correct — group order was never contractual; this is purely test hygiene.What
Swept every position-pinned by-group assertion in the suite:
(at (at (select {… by:…}) …) N)candidates → 112 after excluding the safe majority (single-groupwhere: (== k v)filters andasc:/desc:/take:-ordered queries) → the genuinely fragile multi-group ones.Each was classified and either left SAFE (single group, deterministic ordering, or identical value across all groups) or rewritten order-independently while preserving the canonically-correct value (recomputed from the table data, never fudged to pass):
where: (== key val)so index 0 is determinate, or(asc (at sel 'm)) -- […],(asc (raze (at sel 't)))for top/bot LIST cells.13 test files changed; no engine code touched.
Verification — under THREE distinct group orderings
make testgcc debug+ASan/UBSan: 3244/3246, 0 failedmake coverageclang -O0 full suite: 3245/3247, 0 failed (was aborting on the 2 flaky tests before)Coverage after: Regions 84.8% · Functions 98.5% · Lines 89.4% · Branches 68.6%.
🤖 Generated with Claude Code