Skip to content

NOT IN + dynamic filters + build-side NULL returns incorrect results #23126

Description

@neilconway

Describe the bug

In a predicate like x NOT IN (subquery), x rows that are NULL should only be emitted if subquery is empty.

x NOT IN (subquery) plans to a null-aware LeftAnti join (outer = build, subquery = probe). enable_join_dynamic_filter_pushdown (default on) pushes a bounds + membership filter, built from the outer key, onto the probe scan. This filter might eliminate all values from the probe-side input; this is interpreted by the LeftAnti join as indicating that the probe is actually empty, which means that build-side NULLs will incorrectly be emitted.

The result is scan-dependent: a VALUES scan ignores the pushed filter and is correct, while a parquet scan applies it and is wrong. The probe can be emptied by either row-group/page pruning or by row-level filtering.

To Reproduce

create table ao(id int)      as values (5), (null);  -- outer: NULL + a non-NULL value
create table i_disj(eid int) as values (2), (3);     -- probe: non-empty, no match, no NULL

-- (A) VALUES scan -> 5  (correct)
select id from ao where id not in (select eid from i_disj) order by id;

copy ao     to '/tmp/ao.parquet'     stored as parquet;
copy i_disj to '/tmp/i_disj.parquet' stored as parquet;
create external table ao_p(id int)      stored as parquet location '/tmp/ao.parquet';
create external table i_disj_p(eid int) stored as parquet location '/tmp/i_disj.parquet';

-- (B) parquet scan, same query -> 5, NULL  (wrong)
select id from ao_p where id not in (select eid from i_disj_p) order by id;

Expected behavior

No response

Additional context

#23103 is similar but orthogonal: that issue talks about incorrectly handling of probe-side NULLs, here the problem relates to incorrect handling of build-side NULLs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions