What happens?
A null-aware anti join (NOT IN) returns rows when its inner side has a NULL, instead of zero. With enable_join_dynamic_filter_pushdown on, HashJoinExec pushes a build-side filter (key IN build_keys) to the probe scan. That filter drops the probe's NULL row before the join's null-aware check runs, so NOT IN three-valued logic never collapses the result to empty.
It only shows when the probe scan applies the filter at row level, e.g. parquet with pushdown_filters = true. An in-memory scan that ignores the pushed filter still returns the correct empty result.
Reproduced on 54.0.0 and current main.
To Reproduce
set datafusion.optimizer.enable_join_dynamic_filter_pushdown = true;
set datafusion.execution.parquet.pushdown_filters = true;
create table outer_t(id int) as values (1), (2), (3);
create table inner_t(eid int) as values (2), (null);
copy outer_t to '/tmp/outer.parquet' stored as parquet;
copy inner_t to '/tmp/inner.parquet' stored as parquet;
create external table outer_p(id int) stored as parquet location '/tmp/outer.parquet';
create external table inner_p(eid int) stored as parquet location '/tmp/inner.parquet';
select id from outer_p where id not in (select eid from inner_p) order by id;
Expected zero rows: a NULL in the inner set makes every NOT IN comparison unknown. Instead it returns:
+----+
| id |
+----+
| 1 |
| 3 |
+----+
What happens?
A null-aware anti join (
NOT IN) returns rows when its inner side has a NULL, instead of zero. Withenable_join_dynamic_filter_pushdownon,HashJoinExecpushes a build-side filter (key IN build_keys) to the probe scan. That filter drops the probe's NULL row before the join's null-aware check runs, soNOT INthree-valued logic never collapses the result to empty.It only shows when the probe scan applies the filter at row level, e.g. parquet with
pushdown_filters = true. An in-memory scan that ignores the pushed filter still returns the correct empty result.Reproduced on
54.0.0and currentmain.To Reproduce
Expected zero rows: a NULL in the inner set makes every
NOT INcomparison unknown. Instead it returns: