Skip to content

test: add SQL test coverage for spark.sql.legacy.timeParserPolicy#4183

Open
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:feat/legacy-time-parser-policy-tests
Open

test: add SQL test coverage for spark.sql.legacy.timeParserPolicy#4183
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:feat/legacy-time-parser-policy-tests

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 2, 2026

Which issue does this PR close?

Part of #4180

Rationale for this change

spark.sql.legacy.timeParserPolicy (LEGACY / CORRECTED / EXCEPTION) controls which datetime parser Spark uses and changes results materially on lenient inputs and ambiguous patterns. No existing Comet SQL test exercises this config, so we have no regression net for the seven expressions that read it. This PR closes that gap.

What changes are included in this PR?

For each Spark expression that reads the policy (date_format, from_unixtime, unix_timestamp, to_unix_timestamp, to_timestamp/to_timestamp_ntz, to_date, and Spark 4's try_to_timestamp):

  • A ConfigMatrix file that runs convergent inputs under LEGACY, CORRECTED, and EXCEPTION.
  • Per-policy files (*_legacy.sql, *_corrected.sql, *_exception.sql) covering divergent inputs: single-digit fields under fixed-width patterns, out-of-range month/day, trailing characters, legacy-only pattern tokens like aaaa, and the INCONSISTENT_BEHAVIOR_CROSS_VERSION exception paths.

A new contributor-guide page spark_configs_support.md mirrors the expression audit log: it tracks Spark configs that affect Comet behavior and records the full audit notes for spark.sql.legacy.timeParserPolicy (source semantics, affected expressions, current Comet status, test layout, findings).

This PR was scaffolded with the project's audit-comet-expression workflow extended to a config-level audit, plus the superpowers:brainstorming and superpowers:using-git-worktrees skills.

How are these changes tested?

CometSqlFileTestSuite runs the 42 generated test cases through both Spark and Comet and compares results. Verified locally:

  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite time_parser_policy" -Dtest=none -- 42/42 pass on Spark 3.5.8 (default).
  • ./mvnw test -Pspark-3.4 -Dsuites="org.apache.comet.CometSqlFileTestSuite time_parser_policy" -Dtest=none -- 42/42 pass.
  • ./mvnw test -Pspark-4.0 -Dsuites="org.apache.comet.CometSqlFileTestSuite try_to_timestamp_time_parser_policy" -Dtest=none -- 6/6 pass; to_timestamp_time_parser_policy_exception also verified on 4.0.

No Comet bugs were uncovered by the audit.

andygrove added 2 commits May 2, 2026 08:16
Audit every Spark expression that reads spark.sql.legacy.timeParserPolicy
(date_format, from_unixtime, unix_timestamp, to_unix_timestamp, to_timestamp,
to_date, and Spark 4's try_to_timestamp) and add CometSqlFileTestSuite
coverage. For each expression provide:

- a ConfigMatrix file exercising convergent inputs under LEGACY, CORRECTED,
  and EXCEPTION
- per-policy files locking in divergent behavior (lenient parsing under
  LEGACY, null returns under CORRECTED, INCONSISTENT_BEHAVIOR_CROSS_VERSION
  under EXCEPTION)

Also add docs/source/contributor-guide/spark_configs_support.md modeled on
the expression audit log to track Spark configs that affect Comet behavior,
with full audit notes for the timeParserPolicy entry.

All 42 generated tests pass on Spark 3.4.3, 3.5.8, and 4.0.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant