fix(spark): ANSI-aware custom nullability for make_interval#22057
Open
EshwarCVS wants to merge 3 commits intoapache:mainfrom
Open
fix(spark): ANSI-aware custom nullability for make_interval#22057EshwarCVS wants to merge 3 commits intoapache:mainfrom
make_interval#22057EshwarCVS wants to merge 3 commits intoapache:mainfrom
Conversation
Spark's make_interval nullability rule mirrors `failOnError`: - nullary call → not nullable (always returns zero interval) - ANSI mode on (failOnError=true) → nullable only when any input is nullable - ANSI mode off (default) → always nullable (overflow silently returns NULL) Previous PRs (apache#19248, apache#19606) could not implement this correctly because ReturnFieldArgs carries no ConfigOptions. The fix follows the SparkCast pattern: store `ansi_mode` as a struct field, populate it via `with_updated_config`, and use `self.ansi_mode` in `return_field_from_args`. In ANSI mode, arithmetic overflow in make_interval_kernel now returns an error instead of silently producing NULL, matching Spark's behavior. Closes apache#19155
fix(spark): implement ANSI-aware custom nullability for make_interval
Contributor
|
@EshwarCVS can you take a look at this #19155 (comment) |
…onfig access Resolves the limitation described in apache#19155 where `ReturnFieldArgs` (used by `return_field_from_args` at planning time) had no access to `ConfigOptions`, forcing UDFs to store config-derived state as struct fields via `with_updated_config`. Changes: - Add `config_options: &'a ConfigOptions` field to `ReturnFieldArgs` - Physical planning path (`ScalarFunctionExpr::try_new`) passes the real session config options, giving UDFs correct config at physical plan time - Logical planning path (`expr_schema.rs`) passes `ConfigOptions::default()` as before (UDFs that need config-correct logical plans still use `with_updated_config`) - FFI conversion stores a default `ConfigOptions` in `ForeignReturnFieldArgsOwned` since config cannot cross the FFI boundary - Update `SparkMakeInterval::return_field_from_args` to read ANSI mode directly from `args.config_options.execution.enable_ansi_mode` instead of `self.ansi_mode`, demonstrating the intended usage pattern - Update all ~40 call sites with `config_options: &ConfigOptions::default()` (test sites) or the actual session config (production sites) Closes apache#19155
Author
updated |
Contributor
|
I am not in favor of this PR. For the default/non-ANSI mode, the current behavior already matches Spark: The ANSI-mode part needs a broader design discussion because it changes core UDF return-field API and still does not prove the full SQL/planning path works correctly. Given that, I think we should close this PR for now instead of making this change. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
make_intervalneed to have custom nullability #19155Rationale for this change
Spark's
make_intervalnullability follows this rule (from Spark source):Previous attempts (#19248, #19606) tried to derive nullability from input
fields inside return_field_from_args, but ReturnFieldArgs carries no
ConfigOptions, so ANSI mode was inaccessible.
What changes are included in this PR?
Add ansi_mode: bool field to SparkMakeInterval, populated via a new
new_with_config(config) constructor.
Implement with_updated_config so the planner keeps the UDF in sync with
session config (same pattern as SparkCast).
Implement return_field_from_args with Spark's exact rule:
nullary call → nullable = false (always returns zero interval)
ANSI mode on → nullable = any input field is nullable
ANSI mode off (default) → nullable = true
In ANSI mode, make_interval_kernel now returns exec_err! on arithmetic
overflow instead of silently appending NULL.
Are these changes tested?
Yes. Six new unit tests cover all branches:
Are there any user-facing changes?
Non-ANSI mode (default): no behavior change.
ANSI mode: make_interval with non-nullable inputs now correctly
reports a non-nullable output field, and arithmetic overflow raises an
error instead of silently returning NULL.