[SPARK-57718][SQL][TESTS] Add JOIN correctness tests for nanosecond-precision timestamp types by stevomitric · Pull Request #56812 · apache/spark

stevomitric · 2026-06-26T14:32:18Z

What changes were proposed in this pull request?

Tests-only coverage for equi-JOINs over the nanosecond-precision timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in [7, 9]). One new suite, TimestampNanosJoinSuiteBase, with AnsiOn / AnsiOff subclasses.

The headline assertion is sub-microsecond join-key correctness: every key in the test relations shares the same epochMicros and differs only in nanosWithinMicro, so a correct join must be driven by the full nanos value —

a fully-equal key (incl. the sub-microsecond remainder) must join,
an epochMicros-equal-but-nanosWithinMicro-distinct key must not join,
NULL keys must not match.

Why are the changes needed?

Joins over the nanosecond types already work (they ride on the hashing / ordering / UnsafeRow primitives), but there was no join coverage — Spark's join suites are organized by join type/strategy, not by column type. This locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise tests-only.

Does this PR introduce any user-facing change?

No, tests only.

How was this patch tested?

New suites in this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

End-to-end equi-join coverage over the nanosecond timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in 7..9). Joins already work -- they ride on the nanos hash+equals (hash joins) and ordering+equals (sort-merge join) primitives -- but had no test coverage; this locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise tests-only. TimestampNanosJoinSuiteBase (+ ANSI on/off) asserts sub-microsecond join-key correctness: every key shares epochMicros and differs only in nanosWithinMicro, so a fully-equal key must join, an epochMicros-equal-but-nanos-distinct key must not, and NULL keys must not match. Inner and left-outer, NTZ and LTZ, across whole-stage codegen on/off and each of the three equi-join strategies (broadcast-hash, sort-merge, shuffled-hash); each test asserts the intended physical join node fired before checking the answer. Tests only; no production change. Co-authored-by: Isaac

stevomitric · 2026-06-26T15:08:04Z

cc @MaxGekk can you PTAL at this test-only PR.

MaxGekk

0 blocking, 0 non-blocking, 0 nits.
A well-constructed, correct test-only PR. The asserted values are the intended sub-microsecond join semantics, the fixtures genuinely exercise sub-micro key equality/non-equality across all three join strategies and both codegen modes, suite placement and base-class choice are right, and parity with the existing nanos suites (TimestampNanosFunctionsSuiteBase / TimestampNanosWideningSuiteBase) is faithful.

Verification

I checked the asserted values are intended, not incidental, and that the tests can actually observe a precision bug (no SPARK-56395-style trap):

The right relation is neither empty nor key-equal to the left — its 700ns/300ns keys share the same epochMicros as the left's 200ns/900ns. A truncate-to-micros bug would spuriously match every non-null left row to every non-null right row, blowing up INNER and turning the lid=2/lid=3 left-outer NULLs into matches — so the suite fails loudly on such a bug. lid=2 -> null proves a real sub-micro non-match (not an empty right side); lid=4 -> null proves NULL != NULL.
The cross-precision-safety claim holds: SparkDateTimeUtils.truncateNanosWithinMicroToPrecision floors to (n/100)*100 at p=7 and (n/10)*10 at p=8, and the remainders 200/300/500/700/900 are all multiples of 100, so they are exact and non-colliding at every p in [7,9] — the same fixtures and expected rows are valid verbatim at all three precisions.
Strategy forcing is sound: forceApplyShuffledHashJoin short-circuits the SHJ build-side selection (bypassing the muchSmaller heuristic that equal-sized tiny inputs would fail), SHJ permits LEFT OUTER, and with AQE disabled assertJoinUsed("exactly one") is deterministic.

The formal verdict stays with the committer.

MaxGekk · 2026-06-26T17:30:50Z

All tests passed.

+1, LGTM. Merging to master/4.x.
Thank you, @stevomitric.

…ecision timestamp types ### What changes were proposed in this pull request? Tests-only coverage for equi-JOINs over the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p` in `[7, 9]`). One new suite, `TimestampNanosJoinSuiteBase`, with `AnsiOn` / `AnsiOff` subclasses. The headline assertion is sub-microsecond **join-key** correctness: every key in the test relations shares the same `epochMicros` and differs only in `nanosWithinMicro`, so a correct join must be driven by the full nanos value — - a fully-equal key (incl. the sub-microsecond remainder) must join, - an `epochMicros`-equal-but-`nanosWithinMicro`-distinct key must **not** join, - NULL keys must not match. ### Why are the changes needed? Joins over the nanosecond types already work (they ride on the hashing / ordering / `UnsafeRow` primitives), but there was no join coverage — Spark's join suites are organized by join type/strategy, not by column type. This locks the behaviour in, mirroring the MIN/MAX follow-up which was likewise tests-only. ### Does this PR introduce _any_ user-facing change? No, tests only. ### How was this patch tested? New suites in this PR. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes #56812 from stevomitric/stevomitric/timestamp-nanos-join-tests. Authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit ca4e07a) Signed-off-by: Max Gekk <max.gekk@gmail.com>

stevomitric force-pushed the stevomitric/timestamp-nanos-join-tests branch from fd5aa46 to b735ab5 Compare June 26, 2026 14:43

stevomitric changed the title ~~[SQL][TESTS] Add JOIN correctness tests for nanosecond-precision timestamp types~~ [SPARK-57718][SQL][TESTS] Add JOIN correctness tests for nanosecond-precision timestamp types Jun 26, 2026

MaxGekk reviewed Jun 26, 2026

View reviewed changes

MaxGekk approved these changes Jun 26, 2026

View reviewed changes

MaxGekk closed this in ca4e07a Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57718][SQL][TESTS] Add JOIN correctness tests for nanosecond-precision timestamp types#56812

[SPARK-57718][SQL][TESTS] Add JOIN correctness tests for nanosecond-precision timestamp types#56812
stevomitric wants to merge 1 commit into
apache:masterfrom
stevomitric:stevomitric/timestamp-nanos-join-tests

stevomitric commented Jun 26, 2026

Uh oh!

stevomitric commented Jun 26, 2026

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stevomitric commented Jun 26, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

stevomitric commented Jun 26, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Verification

Uh oh!

MaxGekk commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants