Skip to content

WIP: Extension type casts using extension registry#21071

Draft
paleolimbot wants to merge 39 commits intoapache:mainfrom
paleolimbot:extension-type-registry-cast
Draft

WIP: Extension type casts using extension registry#21071
paleolimbot wants to merge 39 commits intoapache:mainfrom
paleolimbot:extension-type-registry-cast

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot commented Mar 20, 2026

Which issue does this PR close?

This PR is a proof of concept stacked on top of #20312 to demonstrate how casting to and from extension types might be supported with the registry design in that PR. All details up for grabs...most of the work here is just piping the registry so that we can resolve a cast to or from an extension type when creating a physical expression from a logical one (the work to pipe SQL and Logical plan casts to extension types was already done before this PR).

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate common Related to common crate datasource Changes to the datasource crate ffi Changes to the ffi crate labels Mar 20, 2026
Comment on lines +94 to +115
let state = SessionStateBuilder::default()
.with_canonical_extension_types()?
.with_type_planner(Arc::new(CustomTypePlanner {}))
.build();
let ctx = SessionContext::new_with_state(state);

ctx.register_batch("test", batch)?;

let df = ctx.sql("SELECT my_uuids::VARCHAR FROM test").await?;
let batches = df.collect().await?;

assert_batches_eq!(
vec![
"+--------------------------------------+",
"| test.my_uuids |",
"+--------------------------------------+",
"| 00000000-0000-0000-0000-000000000000 |",
"| 00010203-0405-0607-0809-000102030506 |",
"+--------------------------------------+",
],
&batches
);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting from a UUID to something else works!

Comment on lines +121 to +141
async fn create_cast_char_to_uuid() -> Result<()> {
let state = SessionStateBuilder::default()
.with_canonical_extension_types()?
.with_type_planner(Arc::new(CustomTypePlanner {}))
.build();
let ctx = SessionContext::new_with_state(state);

let df = ctx
.sql("SELECT '00010203-0405-0607-0809-000102030506'::UUID AS uuid")
.await?;
let batches = df.collect().await?;
assert_batches_eq!(
vec![
"+----------------------------------+",
"| uuid |",
"+----------------------------------+",
"| 00010203040506070809000102030506 |",
"+----------------------------------+",
],
&batches
);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also go the other direction!

Comment on lines +74 to +94
fn cast_from(&self) -> Result<Arc<dyn CastExtension>> {
Ok(Arc::new(DefaultExtensionCast {}))
}

fn cast_to(&self) -> Result<Arc<dyn CastExtension>> {
Ok(Arc::new(DefaultExtensionCast {}))
}
}

pub trait CastExtension: Debug + Send + Sync {
fn can_cast(&self, from: &Field, to: &Field, options: &CastOptions) -> Result<bool>;

// None for fallback
fn cast(
&self,
value: ArrayRef,
from: &Field,
to: &Field,
options: &CastOptions,
) -> Result<ArrayRef>;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interface an extension type can implement to define its interactions with other types

Comment thread datafusion/physical-expr/src/planner.rs Outdated
Comment on lines +301 to +305
if let Some(registry) = &execution_props.extension_types
&& let Some(extension_type) =
registry.create_extension_type_for_field(&field)?
{
let cast_extension = extension_type.cast_from()?;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the cast (to an extension type from something else) gets planned into a physical expression

Comment thread datafusion/physical-expr/src/planner.rs Outdated
Comment on lines +332 to +334
let cast_extension = extension_type.cast_to()?;
if cast_extension.can_cast(&src_field, &field, &DEFAULT_CAST_OPTIONS)? {
return expressions::cast_with_extension(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the cast from something else to an extension type is resolved to a physical expression.

(I didn't handle the case where an extension type is getting cast to another extension type, but that should get handled if this is ever going to merge because we have to make sure the interface can handle either the right OR the left side defining this cast without erroring if the other side doesn't handle it)

Comment on lines +275 to +286
if let Some(cast_extension) = &self.cast_extension {
let from_field = self.expr.return_field(&batch.schema())?;
let to_field = self.return_field(&batch.schema())?;
match value {
ColumnarValue::Array(array) => {
Ok(ColumnarValue::Array(cast_extension.cast(
array,
&from_field,
&to_field,
&self.cast_options,
)?))
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the cast is executed. I chose to tack on the CastExtension to the CastExpr but it could also be its own PhysicalExpr (maybe safer in case I haven't considered some of the things that could happen during physical optimizer passes)

@paleolimbot paleolimbot force-pushed the extension-type-registry-cast branch from 6326d62 to 4f82129 Compare April 29, 2026 15:47
paleolimbot and others added 3 commits April 29, 2026 14:32
Co-authored-by: Copilot <copilot@github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [  81.187s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.035s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  79.861s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.034s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.645s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 163.751s] datafusion
    Building datafusion-common v53.1.0 (current)
       Built [  32.432s] (current)
     Parsing datafusion-common v53.1.0 (current)
      Parsed [   0.057s] (current)
    Building datafusion-common v53.1.0 (baseline)
       Built [  32.452s] (baseline)
     Parsing datafusion-common v53.1.0 (baseline)
      Parsed [   0.057s] (baseline)
    Checking datafusion-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.714s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/function_parameter_count_changed.ron

Failed in:
  datafusion_common::nested_struct::validate_data_type_compatibility now takes 4 parameters instead of 3, in /home/runner/work/datafusion/datafusion/datafusion/common/src/nested_struct.rs:517

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  67.094s] datafusion-common
    Building datafusion-expr v53.1.0 (current)
       Built [  25.786s] (current)
     Parsing datafusion-expr v53.1.0 (current)
      Parsed [   0.071s] (current)
    Building datafusion-expr v53.1.0 (baseline)
       Built [  25.587s] (baseline)
     Parsing datafusion-expr v53.1.0 (baseline)
      Parsed [   0.071s] (baseline)
    Checking datafusion-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.102s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip

--- failure auto_trait_impl_removed: auto trait no longer implemented ---

Description:
A public type has stopped implementing one or more auto traits. This can break downstream code that depends on the traits being implemented.
        ref: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/auto_trait_impl_removed.ron

Failed in:
  type MemoryExtensionTypeRegistry is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/registry.rs:440
  type MemoryExtensionTypeRegistry is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/registry.rs:440

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field ExecutionProps.extension_types in /home/runner/work/datafusion/datafusion/datafusion/expr/src/execution_props.rs:78

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  53.946s] datafusion-expr
    Building datafusion-optimizer v53.1.0 (current)
       Built [  25.831s] (current)
     Parsing datafusion-optimizer v53.1.0 (current)
      Parsed [   0.026s] (current)
    Building datafusion-optimizer v53.1.0 (baseline)
       Built [  25.719s] (baseline)
     Parsing datafusion-optimizer v53.1.0 (baseline)
      Parsed [   0.027s] (baseline)
    Checking datafusion-optimizer v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.176s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  52.866s] datafusion-optimizer
    Building datafusion-physical-expr v53.1.0 (current)
       Built [  24.294s] (current)
     Parsing datafusion-physical-expr v53.1.0 (current)
      Parsed [   0.040s] (current)
    Building datafusion-physical-expr v53.1.0 (baseline)
       Built [  24.388s] (baseline)
     Parsing datafusion-physical-expr v53.1.0 (baseline)
      Parsed [   0.043s] (baseline)
    Checking datafusion-physical-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.331s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/method_parameter_count_changed.ron

Failed in:
  datafusion_physical_expr::expressions::CastExpr::new_with_target_field takes 3 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/92527780eba26a2ebae9b4c616bdae0c30875248/datafusion/physical-expr/src/expressions/cast.rs:121, but now takes 4 parameters in /home/runner/work/datafusion/datafusion/datafusion/physical-expr/src/expressions/cast.rs:131

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  50.239s] datafusion-physical-expr
    Building datafusion-physical-expr-adapter v53.1.0 (current)
       Built [  30.083s] (current)
     Parsing datafusion-physical-expr-adapter v53.1.0 (current)
      Parsed [   0.009s] (current)
    Building datafusion-physical-expr-adapter v53.1.0 (baseline)
       Built [  29.987s] (baseline)
     Parsing datafusion-physical-expr-adapter v53.1.0 (baseline)
      Parsed [   0.010s] (baseline)
    Checking datafusion-physical-expr-adapter v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.076s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  61.389s] datafusion-physical-expr-adapter
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  31.938s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.118s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  35.208s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.131s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.635s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  69.443s] datafusion-physical-plan
    Building datafusion-pruning v53.1.0 (current)
       Built [  37.362s] (current)
     Parsing datafusion-pruning v53.1.0 (current)
      Parsed [   0.011s] (current)
    Building datafusion-pruning v53.1.0 (baseline)
       Built [  36.037s] (baseline)
     Parsing datafusion-pruning v53.1.0 (baseline)
      Parsed [   0.012s] (baseline)
    Checking datafusion-pruning v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.075s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  75.257s] datafusion-pruning
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 134.504s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.021s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 140.994s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.026s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.091s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 279.225s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 7, 2026
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 7, 2026
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 8, 2026
@github-actions github-actions Bot removed datasource Changes to the datasource crate ffi Changes to the ffi crate labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants