[SPARK-57303][SQL] Store-assignment and up-cast rules for nanosecond-precision timestamp types#56810
Open
MaxGekk wants to merge 3 commits into
Open
[SPARK-57303][SQL] Store-assignment and up-cast rules for nanosecond-precision timestamp types#56810MaxGekk wants to merge 3 commits into
MaxGekk wants to merge 3 commits into
Conversation
uros-b
reviewed
Jun 26, 2026
…precision timestamp types Define a precision-safe store-assignment / up-cast contract for the whole LTZ/NTZ timestamp family (micros and nanos, including cross-family pairs): lossless widening (target fractional precision >= source) is an up-cast and ANSI-store-assignable, while lossy narrowing is blocked. DATE <-> nanos is aligned to the micro DATE <-> TIMESTAMP behavior. Adds a shared TimestampFamily helper reused by UpCastRule, Cast.canANSIStoreAssign, and TypeCoercionHelper.findWiderDateTimeType.
857b90a to
b65fd06
Compare
uros-b
approved these changes
Jun 26, 2026
…ual-precision comment Address review feedback on the intra-family widening case in UpCastRule.canUpCast: adopt the .exists formulation (drops the double .get) while keeping it as a guard so non-timestamp pairs still fall through to the cases below, and correct the comment to note that cross-family equal-precision pairs are admitted by the <= here, not by the from == to short-circuit. Co-authored-by: Isaac
…ated-column validation Cover two consumers of the relaxed canUpCast/canANSIStoreAssign rules that the predicate-level tests do not reach: - CastSuiteBase: try-cast nullability follows up-cast admissibility for the timestamp family (Cast.nullable's try-cast branch keys on canUpCast). - GeneratedColumnExpressionSuite (new): GeneratedColumnExpression.validate accepts a lossless widening (micros -> nanos) and rejects a lossy narrowing. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR defines a precision-safe store-assignment / up-cast contract for the whole LTZ/NTZ timestamp family - the microsecond types (
TIMESTAMP/TIMESTAMP_NTZ) and their nanosecond-precision counterparts (TIMESTAMP_LTZ(p)/TIMESTAMP_NTZ(p),pin[7, 9]) - using a single notion of effective fractional-second precision (micros = 6, nanos =p).For any ordered pair of timestamp-family types (including across the LTZ/NTZ boundary, which Spark already treats as a mutual up-cast for the micro types):
>=source precision: lossless widening -> up-cast (STRICT) and ANSI-store-assignable;<source precision: lossy narrowing -> not an up-cast, blocked under ANSI so it can never silently truncate.DATE <-> nanosis aligned to the microDATE <-> TIMESTAMPbehavior:DATE -> nanosis a lossless widening (up-cast + ANSI-store-assignable), whilenanos -> DATEdrops the time-of-day (not an up-cast, but still ANSI-store-assignable). LEGACY policy and explicitCASTare unchanged (they still truncate on narrowing).TIME <-> timestampis unchanged and stays consistent withTIME <-> micros(never an up-cast, ANSI-store-assignable both ways).Concretely:
private[sql] object TimestampFamily(sql/api) withfractionalPrecision(dt): Option[Int]plusisLtz/isNtz, reused across the rule sites (no type-hierarchy change, so no MiMa impact).UpCastRule.canUpCast: a single lossless-widening arm for the family (subsuming the existingTimestampType <-> TimestampNTZTypecases), plus a generalizedDATE -> familywidening arm.Cast.canANSIStoreAssign: replaced the piecemeal per-subtask arms with one family narrowing block built on the shared helper, before the genericDatetimeTypearm.TypeCoercionHelper.findWiderDateTimeType: refactored onto the shared helper (behavior-preserving) and updated the now-stale comment, since common-type resolution and the cast rules now agree on admissibility.Why are the changes needed?
Before this change, the nanosecond timestamp types fell through the generic
(_: DatetimeType, _: DatetimeType)arm inCast.canANSIStoreAssign(risking silent sub-microsecond truncation handled only narrowly), and they were absent fromUpCastRule.canUpCast, so STRICT store assignment and up-cast resolution rejected even lossless widening. This PR gives the family a complete, precision-safe contract consistent with the microsecond precedent.Does this PR introduce any user-facing change?
No. The nanosecond-precision timestamp types are unreleased (
@Unstable), so this only affects behavior within the unreleased branch.How was this patch tested?
SPARK-57293/SPARK-57490/ cross-family / micro-boundary contract tests inCastSuiteBaseto the precision-safe widening model.canUpCastandcanANSIStoreAssignare true iff target precision>=source precision, plusDATEandTIMEconsistency anchors.CastSuite,CastWithAnsiOn/Off,TypeCoercionSuite,AnsiTypeCoercionSuite,DataTypeWriteCompatibilitySuite,V2WriteAnalysisSuite, andSQLQueryTestSuite(cast / try_cast / nanos / typeCoercion) - all pass with no golden-file changes.canUpCastthat the predicate-level tests do not reach: aCastSuiteBasetest thatCast.nullable's try-cast branch follows up-cast admissibility for the timestamp family (non-null preserved on widening, conservatively nullable on narrowing), and a newGeneratedColumnExpressionSuiteassertingGeneratedColumnExpression.validateaccepts a lossless widening generation expression and rejects a lossy narrowing.Note on scope
canUpCast/canANSIStoreAssignfeed several consumers beyond up-cast resolution and store assignment (generated-column validation, subquery decorrelation, V2 expression pushdown,Casttry-cast nullability, and the Spark ConnectArrowVectorReaderguard). The widening relaxation here is lossless and applies uniformly to all of them. One follow-up item: nanosecond timestamp types are not yet supported over Spark Connect (noConnectTypeOps/ vector reader), soArrowVectorReader'scanUpCastguard no longer fails fast on a micro-vector/nanos-target mismatch; whenever nanos-over-Connect is implemented, that PR should add the reader and re-examine this guard.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)