Skip to content

fix(postgres): correctly infer nullability for LEFT JOIN rewritten as RIGHT JOIN#4285

Open
luca992 wants to merge 3 commits into
transact-rs:mainfrom
luca992:fix/postgres-left-join-rewrite-nullability
Open

fix(postgres): correctly infer nullability for LEFT JOIN rewritten as RIGHT JOIN#4285
luca992 wants to merge 3 commits into
transact-rs:mainfrom
luca992:fix/postgres-left-join-rewrite-nullability

Conversation

@luca992
Copy link
Copy Markdown

@luca992 luca992 commented May 28, 2026

PostgreSQL's planner may execute A LEFT JOIN B as a hash join with Join Type: Right to put the smaller relation on the hash-build side.

This is documented behavior:

  • [Postgres docs, 14.3 Controlling the Planner with Explicit JOIN Clauses]

    "Most practical cases involving LEFT JOIN or RIGHT JOIN can be
    rearranged to some extent."
    https://www.postgresql.org/docs/current/explicit-joins.html

  • [Postgres Pro, "Queries in PostgreSQL: 6. Hashing"]

    "On the physical level, the planner determines which set is the
    inner one and which is the outer one not by their positions in
    the query, but by the relative join cost. ... So, the join type
    switches from left to right in the plan."
    https://postgrespro.com/blog/pgsql/5969673

After the swap the SQL right operand (the nullable side under LEFT JOIN semantics) ends up as the plan's Outer child instead of the Inner child. The old visit_plan only marked Inner children nullable. On a Join Type: Right plan that produced two distinct failures:

  • the SQL left operand (always preserved) got marked nullable, causing spurious Option<T> in macro output for NOT NULL columns
  • the SQL right operand was not marked nullable, masking real NULLs and panicking at decode time when no LEFT JOIN row matched

The old visit_plan also only inspected children of outer joins, never the join node's own Output. Computed expressions like b.x || 'y' or COALESCE(b.x, NULL) materialize at the join node itself; the nullable child only carries raw column refs, so nullability inference silently dropped these columns. Postgres further deparses these expressions with an extra outer paren pair at root (((expr))) compared to the join node ((expr)), so exact-string matching missed them.

Fix threads in_nullable through visit_plan and picks the NULL-fill side from each join's type:

  • LeftInner child is nullable
  • RightOuter child is nullable
  • Full → both children are nullable

It also walks the join node's own Output and marks any entry that mentions a qualified column (alias.col, per PG manual §4.1.1) drawn from the nullable-side subtree's leaves. That catches computed expressions like b.x || 'y' or COALESCE(b.x, …) that only materialize at the join level, and leaves sibling SubPlan / InitPlan outputs ((SubPlan N)) alone. Subplans are computed independently of the join's NULL extension, so their nullability is genuinely unknown to this pass. Output comparison normalizes redundant outer parens (Postgres-lexical tokenizer for '…', "…", E'…', B'…' / X'…', U&'…' / U&"…", $tag$…$tag$) so root ((expr)) matches join (expr).

Recursion now descends into all child plans, not only when the current node is Left/Right, so nested joins reached through non-join intermediates like Hash are walked.

Does your PR solve an issue?

fixes #3202

Is this a breaking change?

Yes. The old behavior was wrong inference.

For queries with a LEFT JOIN that postgres rewrites as a Hash Right Join (driven by pg_statistic cost estimates, fires on production-sized data with plan_cache_mode = force_generic_plan which sqlx-macros-core itself sets), generated types change in three places:

  • Columns from the SQL left operand that are NOT NULL in their base table go from Option<T> to T. Downstream code handling these unneeded nulls would need to be dropped.

  • Columns from the SQL right operand (the LEFT JOIN nullable side) go from T to Option<T>. Code treating these as non-null was already at risk of unexpected null; try decoding as an Option panics at runtime whenever a row had no matching join partner. With this fix it stops compiling and the field needs to become Option<T>.

  • Computed expressions on the nullable side of any outer join (COALESCE(b.x, …), b.x || y, function calls, CASE, etc.) go from T to Option<T> for the same reason. Same latent panic risk, same migration.

The right-operand and computed-expression cases are the bigger ones. Code passed type checks before only because LEFT JOINs happened to always find a match in test data. After this fix the type system exposes the real nullability.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sqlx::query_as!() returns unexpected null; try decoding as an Option when multiple (left) joins are used

2 participants