Skip to content

test: add parity tests for spark.sql.legacy.timeParserPolicy#4181

Draft
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:test-timeparserpolicy
Draft

test: add parity tests for spark.sql.legacy.timeParserPolicy#4181
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:test-timeparserpolicy

Conversation

@andygrove
Copy link
Copy Markdown
Member

Summary

Adds a new test suite (CometTimeParserPolicySuite) that verifies Comet matches Spark under non-default values of spark.sql.legacy.timeParserPolicy (CORRECTED and LEGACY) for: CAST(string AS date/timestamp), to_date, to_timestamp, unix_timestamp, from_unixtime, and date_format.

This is draft / exploratory — the goal is to document current behavior gaps rather than fix them. It's part of a broader audit of Spark configs whose non-default values may produce divergent results in Comet.

Findings

Running on Spark 4.0:

Test Result
CAST(string AS date) pass
CAST(string AS timestamp) failignored
to_date(s) without pattern pass
to_date(s, pattern) pass (falls back to Spark — no ParseToDate serde)
to_timestamp(s) without pattern failignored
to_timestamp(s, pattern) pass (falls back)
unix_timestamp(s, pattern) pass (falls back)
from_unixtime(long, pattern) pass (Incompatible(None) default fallback)
date_format(date, pattern) pass (format allowlist)

The two ignored tests show concrete divergence:

Input Spark (LEGACY) Comet
2020-1-1 1:2:3 2020-01-01 01:02:03.0 null

Comet's native ISO parser in native/spark-expr/src/conversion_funcs/string.rs rejects the single-digit month/day/hour/minute/second formats that Spark's SimpleDateFormat accepts under LEGACY. The config is not read anywhere in Comet.

Passing tests mostly represent cases where Comet falls back to Spark (pattern-based functions with no serde handler, or Incompatible(None) default). They're still useful as regression guards.

Context

Follow-up to a broader audit of Spark configs whose non-default values can silently produce wrong results in Comet — other candidates: parquet.datetimeRebaseModeInRead, parquet.int96RebaseModeInRead, parquet.binaryAsString, mapKeyDedupPolicy.

Test plan

  • Compile against -Pspark-4.0 -Pscala-2.13
  • Run CometTimeParserPolicySuite locally — 7 pass, 2 ignore
  • Decide whether the 2 ignored tests drive a fix (honor the policy) or explicit fallback (timeParserPolicy != CORRECTED → fall back to Spark)

andygrove added 2 commits May 1, 2026 19:06
…olicy

Adds parity tests for Spark's timeParserPolicy config (CORRECTED/LEGACY) across
CAST(string AS date/timestamp), to_date, to_timestamp, unix_timestamp,
from_unixtime, and date_format.

Two tests for string-to-timestamp are marked ignore because Comet's native ISO
parser rejects inputs like "2020-1-1 1:2:3" that Spark accepts under LEGACY.
Flip ignore -> test once Comet honors the policy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant