Describe the bug
In a predicate like x NOT IN (subquery), x rows that are NULL should only be emitted if subquery is empty.
x NOT IN (subquery) plans to a null-aware LeftAnti join (outer = build, subquery = probe). enable_join_dynamic_filter_pushdown (default on) pushes a bounds + membership filter, built from the outer key, onto the probe scan. This filter might eliminate all values from the probe-side input; this is interpreted by the LeftAnti join as indicating that the probe is actually empty, which means that build-side NULLs will incorrectly be emitted.
The result is scan-dependent: a VALUES scan ignores the pushed filter and is correct, while a parquet scan applies it and is wrong. The probe can be emptied by either row-group/page pruning or by row-level filtering.
To Reproduce
create table ao(id int) as values (5), (null); -- outer: NULL + a non-NULL value
create table i_disj(eid int) as values (2), (3); -- probe: non-empty, no match, no NULL
-- (A) VALUES scan -> 5 (correct)
select id from ao where id not in (select eid from i_disj) order by id;
copy ao to '/tmp/ao.parquet' stored as parquet;
copy i_disj to '/tmp/i_disj.parquet' stored as parquet;
create external table ao_p(id int) stored as parquet location '/tmp/ao.parquet';
create external table i_disj_p(eid int) stored as parquet location '/tmp/i_disj.parquet';
-- (B) parquet scan, same query -> 5, NULL (wrong)
select id from ao_p where id not in (select eid from i_disj_p) order by id;
Expected behavior
No response
Additional context
#23103 is similar but orthogonal: that issue talks about incorrectly handling of probe-side NULLs, here the problem relates to incorrect handling of build-side NULLs.
Describe the bug
In a predicate like
x NOT IN (subquery),xrows that areNULLshould only be emitted ifsubqueryis empty.x NOT IN (subquery)plans to a null-awareLeftAntijoin (outer = build, subquery = probe).enable_join_dynamic_filter_pushdown(default on) pushes a bounds + membership filter, built from the outer key, onto the probe scan. This filter might eliminate all values from the probe-side input; this is interpreted by theLeftAntijoin as indicating that the probe is actually empty, which means that build-side NULLs will incorrectly be emitted.The result is scan-dependent: a
VALUESscan ignores the pushed filter and is correct, while a parquet scan applies it and is wrong. The probe can be emptied by either row-group/page pruning or by row-level filtering.To Reproduce
Expected behavior
No response
Additional context
#23103 is similar but orthogonal: that issue talks about incorrectly handling of probe-side NULLs, here the problem relates to incorrect handling of build-side NULLs.