[SPARK-21529][SQL] Improve the error message for unsupported Hive union type by AgenticSpark · Pull Request #56775 · apache/spark

AgenticSpark · 2026-06-25T12:57:31Z

What changes were proposed in this pull request?

Detect unsupported Hive uniontype<...> values when converting Hive FieldSchema types to Spark SQL types and raise a dedicated UNSUPPORTED_HIVE_TYPE error instead of the generic CANNOT_RECOGNIZE_HIVE_TYPE parser error.

Why are the changes needed?

Spark SQL does not support Hive union types. Today the failure message comes from the parser path and does not clearly identify that the Hive union type is unsupported.

Does this PR introduce any user-facing change?

Yes. Reading a Hive table column that uses uniontype<...> now reports UNSUPPORTED_HIVE_TYPE with the offending Hive type and column name.

How was this patch tested?

SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "core/testOnly *SparkThrowableSuite -- -t \"Error conditions are correctly formatted\""
build/sbt "hive/testOnly *HiveClientImplSuite"

Was this patch authored or co-authored using generative AI tooling?

Yes. GitHub Copilot assisted with preparing and validating this change.

uros-b · 2026-06-25T13:44:09Z


-  def timestampNanosEpochNanosOverflowError(
-      value: TimestampNanosVal, isNtz: Boolean, sink: String): SparkArithmeticException = {
+  def parquetTimestampNanosOverflowError(


The PR renames timestampNanosEpochNanosOverflowError(value, isNtz, sink) -> parquetTimestampNanosOverflowError(value, isNtz) and hardcodes "Parquet INT64", but does NOT update its 3 call sites (ArrowWriter.scala:406, :426; ParquetWriteSupport.scala:199). The build will likely NOT compile. Also, this rename is entirely unrelated to SPARK-21529; pure scope creep / accidental edit; should very likely be reverted.

…on type Hive uniontype<...> is not supported by Spark SQL. Detect it on the Hive type parse-failure path and raise UNSUPPORTED_HIVE_TYPE so the unsupported type and column are reported directly. Tests: - SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "core/testOnly *SparkThrowableSuite -- -t \"Error conditions are correctly formatted\"" - build/sbt "hive/testOnly *HiveClientImplSuite"

Apply the unsupported Hive union type check in HiveClientImpl so uniontype<...> raises UNSUPPORTED_HIVE_TYPE instead of falling through to the generic parser error. Tests: - build/sbt "hive/testOnly *HiveClientImplSuite"

Keep the generic Hive type parse fallback on its own line after the uniontype check.

AgenticSpark · 2026-06-25T14:28:02Z

Thanks, fixed. I rebuilt the branch on current upstream master and removed the accidental unrelated QueryExecutionErrors.scala changes; the PR diff is back to the Hive union type change only.

HyukjinKwon

1 blocking, 0 non-blocking, 0 nits.
Good, well-tested error-message improvement — the detection is correctly placed (top-level + nested uniontype both tested), toSQLType(String) compiles, and the unrelated scope-creep rename uros-b flagged is reverted on this snapshot. One blocking formatting issue.

Correctness (1)

error-conditions.json:8566: the new entry's closing brace shares a line with "UNSUPPORTED_INSERT" — breaks SparkThrowableSuite's formatting check — see inline

HyukjinKwon · 2026-06-25T23:02:33Z

-  "UNSUPPORTED_INSERT" : {
+  "UNSUPPORTED_HIVE_TYPE" : {
+    "message" : [
+      "Cannot read the Hive type <fieldType> of the column <fieldName> because Spark SQL does not support this data type."


A few lines below this, the new entry closes as:

}, "UNSUPPORTED_INSERT" : {

i.e. the UNSUPPORTED_HIVE_TYPE closing brace and the next error key are on the same line (missing newline). SparkThrowableSuite's "Error conditions are correctly formatted" test re-serializes the error map with a pretty-printer (one key per line) and asserts the file matches exactly, so this will fail CI:

}, "UNSUPPORTED_INSERT" : {

Alphabetical placement (HIVE_TYPE before INSERT) is correct — only the line break is missing. Re-running SPARK_GENERATE_GOLDEN_FILES=1 … SparkThrowableSuite -- -t "Error conditions are correctly formatted" regenerates it correctly. (Otherwise the detection logic and tests look good, and the unrelated timestampNanos… rename is reverted here.)

- error-conditions.json: place the UNSUPPORTED_HIVE_TYPE closing brace and the following UNSUPPORTED_INSERT key on separate lines so the SparkThrowableSuite 'Error conditions are correctly formatted' check passes. - HiveClientImplSuite.scala: add the missing trailing newline that the Scala linter requires (File must end with newline character).

AgenticSpark · 2026-06-26T02:33:05Z

Pushed c52eb80 to fix the two CI failures:

SparkThrowableSuite "Error conditions are correctly formatted": the new UNSUPPORTED_HIVE_TYPE entry's closing brace shared a line with the following UNSUPPORTED_INSERT key. Split them onto separate lines so the re-serialized map matches the file.
Scala linter: added the missing trailing newline to HiveClientImplSuite.scala ("File must end with newline character").

The unrelated timestampNanos… rename flagged earlier isn't part of this branch (already reverted). The Docker integration test failure looks unrelated to this change (JDBC containers) — happy to re-trigger it if needed.

MaxGekk

0 blocking, 0 non-blocking, 0 nits. LGTM — clean, well-scoped error-message improvement: a dedicated UNSUPPORTED_HIVE_TYPE for Hive uniontype<...> columns, replacing the generic parser error.

Verification

Detection is precise — it's gated on the existing ParseException (so it fires only for types that already fail Spark's parser), and the uniontype< substring (with its < anchor) avoids false positives (a field named …uniontype is followed by : in the type string, not <) while still catching a nested struct<a:uniontype<…>>. Locale is imported; the new error builder mirrors cannotRecognizeHiveTypeError; and the UNSUPPORTED_HIVE_TYPE condition is well-formed and alphabetically ordered. The new HiveClientImplSuite uses real FieldSchema fixtures with checkError on the condition + both params, covering the direct and nested cases (SparkFunSuite is the right base — the static fromHiveColumn needs no Hive session).

On reusing an existing condition: the closest is UNSUPPORTED_DATATYPE, but it isn't a clean fit — it's for an unsupported Spark DataType (its call sites pass a Spark type name), whereas here the Hive string never parses into a Spark type, and it would drop the column name. The new condition instead mirrors the Hive-specific CANNOT_RECOGNIZE_HIVE_TYPE (same fieldType/fieldName shape) and is distinguished from it by sqlState — 0A000 (recognized-but-unsupported) vs 429BB (unrecognized) — which is the point of the change. The new condition is the right call.

dongjoon-hyun

Just a question, is @AgenticSpark a human account?

dongjoon-hyun · 2026-06-26T18:40:12Z

+      messageParameters = Map(
+        "fieldType" -> toSQLType(fieldType),
+        "fieldName" -> toSQLId(fieldName)))
+  }


nit. We need a new line after this.

dongjoon-hyun · 2026-06-26T18:42:57Z

+        // fail with a generic message. Detect it and report a clearer error (SPARK-21529).
+        if (hc.getType.toLowerCase(Locale.ROOT).contains("uniontype<")) {
+          throw QueryExecutionErrors.unsupportedHiveTypeError(hc.getType, hc.getName)
+        }


Could you clarify what was the previous error message, @AgenticSpark ?

Today the failure message comes from the parser path and does not clearly identify that the Hive union type is unsupported.

uros-b reviewed Jun 25, 2026

View reviewed changes

AgenticSpark force-pushed the agenticspark/SPARK-21529-uniontype-error branch from 48f49ec to ee5fb78 Compare June 25, 2026 14:19

AgenticSpark added 2 commits June 25, 2026 07:22

[SPARK-21529][SQL] Fix Hive union type fallback formatting

aedba32

Keep the generic Hive type parse fallback on its own line after the uniontype check.

HyukjinKwon reviewed Jun 25, 2026

View reviewed changes

AgenticSpark requested review from HyukjinKwon and uros-b June 26, 2026 04:40

MaxGekk approved these changes Jun 26, 2026

View reviewed changes

dongjoon-hyun reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-21529][SQL] Improve the error message for unsupported Hive union type#56775

[SPARK-21529][SQL] Improve the error message for unsupported Hive union type#56775
AgenticSpark wants to merge 4 commits into
apache:masterfrom
AgenticSpark:agenticspark/SPARK-21529-uniontype-error

AgenticSpark commented Jun 25, 2026

Uh oh!

uros-b Jun 25, 2026

Uh oh!

AgenticSpark commented Jun 25, 2026

Uh oh!

HyukjinKwon left a comment

Uh oh!

HyukjinKwon Jun 25, 2026

Uh oh!

AgenticSpark commented Jun 26, 2026

Uh oh!

MaxGekk left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun Jun 26, 2026

Uh oh!

dongjoon-hyun Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

AgenticSpark commented Jun 25, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-b Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

AgenticSpark commented Jun 25, 2026

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Correctness (1)

Uh oh!

HyukjinKwon Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

AgenticSpark commented Jun 26, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Verification

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants