Skip to content

[SPARK-57716][SQL][TESTS] Add SORT / ORDER BY and window correctness tests for nanosecond-precision timestamp types#56811

Open
stevomitric wants to merge 1 commit into
apache:masterfrom
stevomitric:stevomitric/timestamp-nanos-sort-tests
Open

[SPARK-57716][SQL][TESTS] Add SORT / ORDER BY and window correctness tests for nanosecond-precision timestamp types#56811
stevomitric wants to merge 1 commit into
apache:masterfrom
stevomitric:stevomitric/timestamp-nanos-sort-tests

Conversation

@stevomitric

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Tests-only coverage for SORT / ORDER BY and window functions over the nanosecond-precision
timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in [7, 9]):

  • TimestampNanosWindowSuiteBase (+ ANSI on/off): row_number / rank / dense_rank / lag / lead over a nanosecond ordering key, NTZ and LTZ, whole-stage codegen on and off. No existing test asserts a window function over a timestamp ordering key.
  • TimestampNanosSortSuiteBase (+ ANSI on/off): the DataFrame/SQL scenarios not already covered generically by OrderingSuite / SortSuite (which exercise these types via DataTypeTestUtils.atomicTypes) — a public-API ORDER BY sub-microsecond tie-break smoke test, mixed-precision UNION ordering, a vectorized-ORC-read-then-sort, and an intercept documenting that caching a nanosecond column is not supported yet.
  • Golden coverage: a short ORDER BY + row_number + lead section appended to timestamp-ntz-nanos.sql / timestamp-ltz-nanos.sql.

Why are the changes needed?

SORT / ORDER BY and window functions already work over the nanosecond types (they ride on the orderability / hashing / UnsafeRow primitives), but had no dedicated test coverage. This locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise tests-only. Joins are deferred to a later change.

Does this PR introduce any user-facing change?

No, tests only.

How was this patch tested?

New suites.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…sion timestamp types

End-to-end SORT / ORDER BY and window-function coverage over the nanosecond
timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in 7..9). These already
work (they ride on the orderability / hashing / UnsafeRow primitives); this
locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise
tests-only. Joins are deferred to a later change.

- TimestampNanosWindowSuiteBase (+ ANSI on/off): row_number / rank / dense_rank
  / lag / lead over a nanosecond ordering key, NTZ and LTZ, codegen on and off.
  No existing test asserts a window function over a timestamp ordering key.
- TimestampNanosSortSuiteBase (+ ANSI on/off): the DataFrame / SQL scenarios not
  already covered generically by OrderingSuite / SortSuite (which exercise the
  nanos types via DataTypeTestUtils.atomicTypes) -- a public-API ORDER BY
  tie-break smoke test, mixed-precision UNION ordering, a vectorized-ORC-read-
  then-sort, and an intercept documenting that caching a nanosecond column is
  not supported yet.
- Golden coverage: a short ORDER BY + row_number + lead section appended to
  timestamp-ntz-nanos.sql / timestamp-ltz-nanos.sql.

Tests only; no production change.

Co-authored-by: Isaac
@stevomitric stevomitric changed the title [SQL][TESTS][WIP] Add SORT / ORDER BY and window correctness tests for nanosecond-precision timestamp types [SPARK-57716][SQL][TESTS] Add SORT / ORDER BY and window correctness tests for nanosecond-precision timestamp types Jun 26, 2026
@stevomitric

Copy link
Copy Markdown
Contributor Author

cc @MaxGekk PTAL.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 2 non-blocking, 0 nits.
Clean, well-documented tests-only PR. The suites are built so they genuinely observe nanosecond ordering (tie-break fixtures share epochMicros and differ only in nanosWithinMicro, so the distinct row-number / rank / strict-ordered collect is the real proof), assertions are correct, placement matches the existing TimestampNanos*SuiteBase family, and the cache intercept correctly documents the current gap. I independently confirmed the six infrastructure claims the comments rely on (testing-default flag, atomicTypes membership, ORC vectorized nanos getters, the cache-write throw path, findWiderDateTimeType widening, and the golden-file session-zone / bare-LTZ-literal round-trip) against the current source. Two minor non-blocking notes only.

Design / architecture (1)

  • PR description: the body says this "mirrors the MIN/MAX follow-up which was likewise tests-only," but there is no MinMax suite in the tree — MIN/MAX coverage appears to live inside TimestampNanosFunctionsSuiteBase. Minor description imprecision; not load-bearing.

Suggestions (1)

  • TimestampNanosWindowSuiteBase.scala:101: LTZ row_number test covers only ASC while its NTZ sibling covers ASC+DESC — see inline.

}
}

test("row_number over a nanosecond TIMESTAMP_LTZ orders by the sub-microsecond part") {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (non-blocking): this LTZ row_number test asserts only the ASC ordering, whereas its NTZ sibling (row_number over a nanosecond TIMESTAMP_NTZ ...) asserts both ASC and DESC. The comparator is type-shared so the risk is low, but adding the DESC arm here would make the NTZ/LTZ pair symmetric and guard the LTZ descending path explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants