[SPARK-57716][SQL][TESTS] Add SORT / ORDER BY and window correctness tests for nanosecond-precision timestamp types#56811
Conversation
…sion timestamp types End-to-end SORT / ORDER BY and window-function coverage over the nanosecond timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in 7..9). These already work (they ride on the orderability / hashing / UnsafeRow primitives); this locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise tests-only. Joins are deferred to a later change. - TimestampNanosWindowSuiteBase (+ ANSI on/off): row_number / rank / dense_rank / lag / lead over a nanosecond ordering key, NTZ and LTZ, codegen on and off. No existing test asserts a window function over a timestamp ordering key. - TimestampNanosSortSuiteBase (+ ANSI on/off): the DataFrame / SQL scenarios not already covered generically by OrderingSuite / SortSuite (which exercise the nanos types via DataTypeTestUtils.atomicTypes) -- a public-API ORDER BY tie-break smoke test, mixed-precision UNION ordering, a vectorized-ORC-read- then-sort, and an intercept documenting that caching a nanosecond column is not supported yet. - Golden coverage: a short ORDER BY + row_number + lead section appended to timestamp-ntz-nanos.sql / timestamp-ltz-nanos.sql. Tests only; no production change. Co-authored-by: Isaac
|
cc @MaxGekk PTAL. |
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 2 non-blocking, 0 nits.
Clean, well-documented tests-only PR. The suites are built so they genuinely observe nanosecond ordering (tie-break fixtures share epochMicros and differ only in nanosWithinMicro, so the distinct row-number / rank / strict-ordered collect is the real proof), assertions are correct, placement matches the existing TimestampNanos*SuiteBase family, and the cache intercept correctly documents the current gap. I independently confirmed the six infrastructure claims the comments rely on (testing-default flag, atomicTypes membership, ORC vectorized nanos getters, the cache-write throw path, findWiderDateTimeType widening, and the golden-file session-zone / bare-LTZ-literal round-trip) against the current source. Two minor non-blocking notes only.
Design / architecture (1)
- PR description: the body says this "mirrors the MIN/MAX follow-up which was likewise tests-only," but there is no
MinMaxsuite in the tree — MIN/MAX coverage appears to live insideTimestampNanosFunctionsSuiteBase. Minor description imprecision; not load-bearing.
Suggestions (1)
- TimestampNanosWindowSuiteBase.scala:101: LTZ
row_numbertest covers only ASC while its NTZ sibling covers ASC+DESC — see inline.
| } | ||
| } | ||
|
|
||
| test("row_number over a nanosecond TIMESTAMP_LTZ orders by the sub-microsecond part") { |
There was a problem hiding this comment.
Minor (non-blocking): this LTZ row_number test asserts only the ASC ordering, whereas its NTZ sibling (row_number over a nanosecond TIMESTAMP_NTZ ...) asserts both ASC and DESC. The comparator is type-shared so the risk is low, but adding the DESC arm here would make the NTZ/LTZ pair symmetric and guard the LTZ descending path explicitly.
What changes were proposed in this pull request?
Tests-only coverage for SORT / ORDER BY and window functions over the nanosecond-precision
timestamp types
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)(pin[7, 9]):TimestampNanosWindowSuiteBase(+ ANSI on/off):row_number/rank/dense_rank/lag/leadover a nanosecond ordering key, NTZ and LTZ, whole-stage codegen on and off. No existing test asserts a window function over a timestamp ordering key.TimestampNanosSortSuiteBase(+ ANSI on/off): the DataFrame/SQL scenarios not already covered generically byOrderingSuite/SortSuite(which exercise these types viaDataTypeTestUtils.atomicTypes) — a public-APIORDER BYsub-microsecond tie-break smoke test, mixed-precisionUNIONordering, a vectorized-ORC-read-then-sort, and aninterceptdocumenting that caching a nanosecond column is not supported yet.ORDER BY+row_number+leadsection appended totimestamp-ntz-nanos.sql/timestamp-ltz-nanos.sql.Why are the changes needed?
SORT / ORDER BY and window functions already work over the nanosecond types (they ride on the orderability / hashing /
UnsafeRowprimitives), but had no dedicated test coverage. This locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise tests-only. Joins are deferred to a later change.Does this PR introduce any user-facing change?
No, tests only.
How was this patch tested?
New suites.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)