Skip to content

Null-aware anti join (NOT IN) returns rows when join dynamic filter pushdown drops the probe NULL #23103

Description

@mdashti

What happens?

A null-aware anti join (NOT IN) returns rows when its inner side has a NULL, instead of zero. With enable_join_dynamic_filter_pushdown on, HashJoinExec pushes a build-side filter (key IN build_keys) to the probe scan. That filter drops the probe's NULL row before the join's null-aware check runs, so NOT IN three-valued logic never collapses the result to empty.

It only shows when the probe scan applies the filter at row level, e.g. parquet with pushdown_filters = true. An in-memory scan that ignores the pushed filter still returns the correct empty result.

Reproduced on 54.0.0 and current main.

To Reproduce

set datafusion.optimizer.enable_join_dynamic_filter_pushdown = true;
set datafusion.execution.parquet.pushdown_filters = true;

create table outer_t(id int) as values (1), (2), (3);
create table inner_t(eid int) as values (2), (null);

copy outer_t to '/tmp/outer.parquet' stored as parquet;
copy inner_t to '/tmp/inner.parquet' stored as parquet;

create external table outer_p(id int) stored as parquet location '/tmp/outer.parquet';
create external table inner_p(eid int) stored as parquet location '/tmp/inner.parquet';

select id from outer_p where id not in (select eid from inner_p) order by id;

Expected zero rows: a NULL in the inner set makes every NOT IN comparison unknown. Instead it returns:

+----+
| id |
+----+
| 1  |
| 3  |
+----+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions