Skip to content

[SPARK-57303][SQL] Store-assignment and up-cast rules for nanosecond-precision timestamp types#56810

Open
MaxGekk wants to merge 3 commits into
apache:masterfrom
MaxGekk:nanos-store-assignment
Open

[SPARK-57303][SQL] Store-assignment and up-cast rules for nanosecond-precision timestamp types#56810
MaxGekk wants to merge 3 commits into
apache:masterfrom
MaxGekk:nanos-store-assignment

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 26, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR defines a precision-safe store-assignment / up-cast contract for the whole LTZ/NTZ timestamp family - the microsecond types (TIMESTAMP / TIMESTAMP_NTZ) and their nanosecond-precision counterparts (TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p), p in [7, 9]) - using a single notion of effective fractional-second precision (micros = 6, nanos = p).

For any ordered pair of timestamp-family types (including across the LTZ/NTZ boundary, which Spark already treats as a mutual up-cast for the micro types):

  • target precision >= source precision: lossless widening -> up-cast (STRICT) and ANSI-store-assignable;
  • target precision < source precision: lossy narrowing -> not an up-cast, blocked under ANSI so it can never silently truncate.

DATE <-> nanos is aligned to the micro DATE <-> TIMESTAMP behavior: DATE -> nanos is a lossless widening (up-cast + ANSI-store-assignable), while nanos -> DATE drops the time-of-day (not an up-cast, but still ANSI-store-assignable). LEGACY policy and explicit CAST are unchanged (they still truncate on narrowing). TIME <-> timestamp is unchanged and stays consistent with TIME <-> micros (never an up-cast, ANSI-store-assignable both ways).

Concretely:

  • New shared private[sql] object TimestampFamily (sql/api) with fractionalPrecision(dt): Option[Int] plus isLtz / isNtz, reused across the rule sites (no type-hierarchy change, so no MiMa impact).
  • UpCastRule.canUpCast: a single lossless-widening arm for the family (subsuming the existing TimestampType <-> TimestampNTZType cases), plus a generalized DATE -> family widening arm.
  • Cast.canANSIStoreAssign: replaced the piecemeal per-subtask arms with one family narrowing block built on the shared helper, before the generic DatetimeType arm.
  • TypeCoercionHelper.findWiderDateTimeType: refactored onto the shared helper (behavior-preserving) and updated the now-stale comment, since common-type resolution and the cast rules now agree on admissibility.

Why are the changes needed?

Before this change, the nanosecond timestamp types fell through the generic (_: DatetimeType, _: DatetimeType) arm in Cast.canANSIStoreAssign (risking silent sub-microsecond truncation handled only narrowly), and they were absent from UpCastRule.canUpCast, so STRICT store assignment and up-cast resolution rejected even lossless widening. This PR gives the family a complete, precision-safe contract consistent with the microsecond precedent.

Does this PR introduce any user-facing change?

No. The nanosecond-precision timestamp types are unreleased (@Unstable), so this only affects behavior within the unreleased branch.

How was this patch tested?

  • Updated the SPARK-57293 / SPARK-57490 / cross-family / micro-boundary contract tests in CastSuiteBase to the precision-safe widening model.
  • Added a full-matrix predicate test over all 8 timestamp-family types asserting canUpCast and canANSIStoreAssign are true iff target precision >= source precision, plus DATE and TIME consistency anchors.
  • Ran CastSuite, CastWithAnsiOn/Off, TypeCoercionSuite, AnsiTypeCoercionSuite, DataTypeWriteCompatibilitySuite, V2WriteAnalysisSuite, and SQLQueryTestSuite (cast / try_cast / nanos / typeCoercion) - all pass with no golden-file changes.
  • Added coverage for two downstream consumers of canUpCast that the predicate-level tests do not reach: a CastSuiteBase test that Cast.nullable's try-cast branch follows up-cast admissibility for the timestamp family (non-null preserved on widening, conservatively nullable on narrowing), and a new GeneratedColumnExpressionSuite asserting GeneratedColumnExpression.validate accepts a lossless widening generation expression and rejects a lossy narrowing.

Note on scope

canUpCast / canANSIStoreAssign feed several consumers beyond up-cast resolution and store assignment (generated-column validation, subquery decorrelation, V2 expression pushdown, Cast try-cast nullability, and the Spark Connect ArrowVectorReader guard). The widening relaxation here is lossless and applies uniformly to all of them. One follow-up item: nanosecond timestamp types are not yet supported over Spark Connect (no ConnectTypeOps / vector reader), so ArrowVectorReader's canUpCast guard no longer fails fast on a micro-vector/nanos-target mismatch; whenever nanos-over-Connect is implemented, that PR should add the reader and re-examine this guard.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/UpCastRule.scala Outdated
…precision timestamp types

Define a precision-safe store-assignment / up-cast contract for the whole
LTZ/NTZ timestamp family (micros and nanos, including cross-family pairs):
lossless widening (target fractional precision >= source) is an up-cast and
ANSI-store-assignable, while lossy narrowing is blocked. DATE <-> nanos is
aligned to the micro DATE <-> TIMESTAMP behavior. Adds a shared
TimestampFamily helper reused by UpCastRule, Cast.canANSIStoreAssign, and
TypeCoercionHelper.findWiderDateTimeType.
@MaxGekk MaxGekk force-pushed the nanos-store-assignment branch from 857b90a to b65fd06 Compare June 26, 2026 13:41
…ual-precision comment

Address review feedback on the intra-family widening case in UpCastRule.canUpCast:
adopt the .exists formulation (drops the double .get) while keeping it as a guard so
non-timestamp pairs still fall through to the cases below, and correct the comment to
note that cross-family equal-precision pairs are admitted by the <= here, not by the
from == to short-circuit.

Co-authored-by: Isaac

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

…ated-column validation

Cover two consumers of the relaxed canUpCast/canANSIStoreAssign rules that the
predicate-level tests do not reach:
- CastSuiteBase: try-cast nullability follows up-cast admissibility for the
  timestamp family (Cast.nullable's try-cast branch keys on canUpCast).
- GeneratedColumnExpressionSuite (new): GeneratedColumnExpression.validate accepts
  a lossless widening (micros -> nanos) and rejects a lossy narrowing.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants