fix(spark-expr): preserve scalar tag in WideDecimalBinaryExpr when both inputs are scalars by tomz · Pull Request #4187 · apache/datafusion-comet

tomz · 2026-05-02T20:25:23Z

Draft Pull Request — apache/datafusion-comet

Title

fix(spark-expr): preserve scalar tag in WideDecimalBinaryExpr when both inputs are scalars

Closes

N/A

edit: removed original text linking to #1615 (TPC-DS q23 BHJ scalar-subquery crash)

Summary

WideDecimalBinaryExpr::evaluate currently always returns ColumnarValue::Array, even when both inputs are ColumnarValue::Scalar. Returning a length-1 Array instead of a Scalar drops the scalar tag, so downstream expressions (binary ops, comparisons) take the array path through arrow-rs kernels.

When the resulting length-1 array later meets a full-batch column in a >/</= comparison, arrow-ord's compare_op rejects it with:

Cannot compare arrays of different lengths, got 8192 vs 1

This crashes any plan in which a wide-decimal arithmetic result of two scalars feeds into a row-level comparison. The clearest reproducer is TPC-DS q23, whose plan contains the filter

cast((0.95 * Subquery#268) as decimal(38,2)) > ssales

0.95 * scalar_subquery is a Scalar × Scalar wide-decimal multiply; the broken-out length-1 array is then compared against the 8192-row ssales column from the build side of the BroadcastHashJoin and the executor crashes.

Fix

Track scalar-ness of both inputs at the top of evaluate, and at the end, if both inputs were scalars, unwrap the length-1 result back into a ColumnarValue::Scalar via ScalarValue::try_from_array(&result, 0).

+        // Track scalar-ness so we can return a Scalar when both inputs are scalars.
+        // Without this, a (Scalar op Scalar) result is returned as a length-1 Array,
+        // and downstream comparisons against full batches incorrectly see two Array
+        // operands with mismatched lengths instead of (Array, Scalar).
+        let both_scalar = matches!(
+            (&left_val, &right_val),
+            (ColumnarValue::Scalar(_), ColumnarValue::Scalar(_))
+        );
         let (left_arr, right_arr): (ArrayRef, ArrayRef) = match (&left_val, &right_val) {
             ...
         };
         ...
         let result = result.with_data_type(DataType::Decimal128(p_out, s_out));
-        Ok(ColumnarValue::Array(Arc::new(result)))
+        if both_scalar {
+            let scalar = datafusion::common::ScalarValue::try_from_array(&result, 0)?;
+            Ok(ColumnarValue::Scalar(scalar))
+        } else {
+            Ok(ColumnarValue::Array(Arc::new(result)))
+        }

Three lines of real logic; the rest is a comment block explaining the bug.

Tests

Added a unit test test_scalar_scalar_returns_scalar to wide_decimal_binary_expr.rs covering:

Scalar × Scalar decimal multiply returns ColumnarValue::Scalar, not Array.
Existing Array op X and X op Array paths still return Array (regression guard).

End-to-end: TPC-DS q23 now passes at SF1 and SF10 (previously crashed in both).

Benchmark Results (TPC-DS, 10 cores, Spark 4.2.0-SNAPSHOT)

	Before fix	After fix
SF1 pass rate	98 / 99 (q23 crash)	99 / 99
SF1 total time	279.9s	259.9s (1.08× faster)
SF1 q14 time	32.8s	10.1s (3.2× faster)
SF10 pass rate	98 / 99 (q23 crash)	99 / 99 (3 consecutive runs)
SF10 q23 time	crash	16.3 / 17.0 / 17.7s (consistent)

The q14 speedup is a bonus — q14 has the same Scalar × Scalar decimal pattern in its filter, and previously paid a heavy materialization cost from the bogus length-1 array path.

Why this didn't trip earlier tests

WideDecimalBinaryExpr is only chosen when output precision ≥ 38. Most existing tests evaluate it against RecordBatch inputs (i.e. Array operands via Column), so the Scalar × Scalar branch was never exercised. The bug only surfaces when the planner produces both-scalar inputs, which happens with literal × scalar_subquery — exactly q23's pattern.

Risk

Minimal:

No change to numeric semantics; the result bytes are unchanged.
Behavioral change is strictly in the wrapping of the result (Scalar vs Array).
All callers downstream already handle ColumnarValue::Scalar correctly — that's the normal contract.

Files changed

native/spark-expr/src/math_funcs/wide_decimal_binary_expr.rs (+~16 lines incl. comments + 1 unit test)

Suggested commit message

fix(spark-expr): preserve scalar tag in WideDecimalBinaryExpr when both inputs are scalars

WideDecimalBinaryExpr::evaluate always returned ColumnarValue::Array,
even when both inputs were Scalar. The resulting length-1 array lost
its scalar tag, so a downstream comparison against a full batch hit
arrow-ord's "Cannot compare arrays of different lengths, got 8192 vs 1".

This is the root cause of the TPC-DS q23 BroadcastHashJoin crash
(issue #1615): the filter `0.95 * scalar_subquery > ssales` produces
a Scalar × Scalar decimal multiply whose length-1 result was then
compared against the 8192-row ssales column.

Detect the both-scalar case and unwrap the length-1 result back into
a ColumnarValue::Scalar so downstream kernels take the scalar fast-path.

Adds a unit test for the both-scalar path.

Closes #1615

…th inputs are scalars WideDecimalBinaryExpr::evaluate always returned ColumnarValue::Array, even when both inputs were Scalar. The resulting length-1 array lost its scalar tag, so a downstream comparison against a full batch hit arrow-ord's "Cannot compare arrays of different lengths, got 8192 vs 1". This is the root cause of the TPC-DS q23 BroadcastHashJoin crash (issue apache#1615): the filter '0.95 * scalar_subquery > ssales' produces a Scalar x Scalar decimal multiply whose length-1 result was then compared against the 8192-row ssales column. Detect the both-scalar case and unwrap the length-1 result back into a ColumnarValue::Scalar so downstream kernels take the scalar fast-path. Adds two unit tests: - test_scalar_scalar_returns_scalar: regression for the crash - test_array_input_returns_array: guards the array path Closes apache#1615

coderfender · 2026-05-03T15:52:51Z

I am not sure if the issue still exists after arrow version bump . @tomz was the benchmark query failing before this fix was implemented?

tomz · 2026-05-04T18:15:23Z

@coderfender yes, still reproducing on current main (425f9c98, post arrow bump).

To verify, I checked out unmodified main and added only the test_scalar_scalar_returns_scalar test from this PR (no fix). It fails:

test test_scalar_scalar_returns_scalar_repro_1615 ... FAILED
panicked at wide_decimal_binary_expr.rs:
BUG REPRO #1615: Scalar x Scalar returned Array, not Scalar

WideDecimalBinaryExpr::evaluate on main still returns ColumnarValue::Array (length-1) when both inputs are Scalar — exactly the condition that triggers the q23 BHJ filter crash, since downstream comparisons against full batches then see two Array operands with mismatched lengths and arrow-ord rejects them with "Cannot compare arrays of different lengths, got N vs 1". The arrow version bump didn't change this code path.

With the fix in this PR applied, both new regression tests pass and q23 SF=1 BHJ no longer crashes.

coderfender · 2026-05-04T18:31:44Z

Thank you. I will try and replicate this on my machine later today / this week and update you

andygrove · 2026-05-04T20:51:09Z

Thanks @tomz. I ran the tests locally without the fix and reproduced the issue.

andygrove · 2026-05-04T20:51:54Z

+                panic!(
+                    "Scalar x Scalar must return ColumnarValue::Scalar, not Array.                      This is the q23 BHJ crash regression (issue #1615)."
+                );


As @coderfender mentioned, this is unrelated to #1615, which was for an Arrow C-Data offset-buffer bug. Also there is some weird spacing in this message.

Suggested change

panic!(

"Scalar x Scalar must return ColumnarValue::Scalar, not Array. This is the q23 BHJ crash regression (issue #1615)."

);

panic!(

"Scalar x Scalar must return ColumnarValue::Scalar, not Array"

);

andygrove reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spark-expr): preserve scalar tag in WideDecimalBinaryExpr when both inputs are scalars#4187

fix(spark-expr): preserve scalar tag in WideDecimalBinaryExpr when both inputs are scalars#4187
tomz wants to merge 1 commit intoapache:mainfrom
tomz:fix/wide-decimal-scalar-tag-1615

tomz commented May 2, 2026 •

edited by andygrove

Loading

Uh oh!

coderfender commented May 3, 2026

Uh oh!

tomz commented May 4, 2026

Uh oh!

coderfender commented May 4, 2026

Uh oh!

andygrove commented May 4, 2026

Uh oh!

andygrove May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tomz commented May 2, 2026 • edited by andygrove Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Draft Pull Request — apache/datafusion-comet

Title

Closes

Summary

Fix

Tests

Benchmark Results (TPC-DS, 10 cores, Spark 4.2.0-SNAPSHOT)

Why this didn't trip earlier tests

Risk

Files changed

Suggested commit message

Uh oh!

coderfender commented May 3, 2026

Uh oh!

tomz commented May 4, 2026

Uh oh!

coderfender commented May 4, 2026

Uh oh!

andygrove commented May 4, 2026

Uh oh!

andygrove May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomz commented May 2, 2026 •

edited by andygrove

Loading