Skip to content

[AURON #2193] Implement native support for inner residual join conditions on SMJ/SHJ#2197

Open
weimingdiit wants to merge 4 commits intoapache:masterfrom
weimingdiit:feat/native_support_inner_join_conditions
Open

[AURON #2193] Implement native support for inner residual join conditions on SMJ/SHJ#2197
weimingdiit wants to merge 4 commits intoapache:masterfrom
weimingdiit:feat/native_support_inner_join_conditions

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

@weimingdiit weimingdiit commented Apr 13, 2026

Which issue does this PR close?

Closes #2193

Rationale for this change

Auron currently only converts SMJ/SHJ joins when the join condition is empty. As a result, inner joins that contain both equi-join keys and a residual predicate fall back to Spark even though the equi-join part is already native-compatible.

The native join plan only models equi-join keys today, so this change keeps the native join focused on the equi-join portion and evaluates the residual predicate with a native filter above the join output.

What changes are included in this PR?

  • Allow native conversion of SortMergeJoinExec and ShuffledHashJoinExec when an InnerLike join has a residual condition.
  • Keep native join conversion based on equi-join keys only.
  • Apply the residual predicate as a native filter on top of the native join output.
  • Continue to reject residual join conditions for non-inner join types.
  • Add query tests for:
    • native SMJ with an inner residual condition
    • native SHJ with an inner residual condition in force-SHJ mode
  • Add a small test helper for Auron configs that are read from SparkEnv/SparkContext.

Are there any user-facing changes?

Yes. Inner joins with equi-join keys plus a residual predicate can now remain on the native SMJ/SHJ path instead of falling back entirely to Spark.

How was this patch tested?

CI.

…conditions on SMJ/SHJ

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the feat/native_support_inner_join_conditions branch from 353dd8d to e02b0bc Compare April 15, 2026 15:51
Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the feat/native_support_inner_join_conditions branch from e02b0bc to 1538724 Compare April 15, 2026 16:23
@weimingdiit weimingdiit marked this pull request as ready for review April 16, 2026 05:15
@cxzl25 cxzl25 requested a review from Copilot April 19, 2026 11:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables native conversion of Spark SMJ/SHJ inner joins that have an additional residual (non-equi) predicate by keeping the native join keyed-only and evaluating the residual predicate via a native Filter above the join.

Changes:

  • Allow native SMJ/SHJ conversion for InnerLike joins with a residual condition, applying it as a native filter above the join.
  • Add query tests covering SMJ/SHJ inner residual conditions (including force-SHJ mode) and a SparkConf test helper.
  • Update TPCDS plan-stability golden files to reflect the new NativeFilter/NativeProject shapes.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Wrap inner residual join conditions with a native filter above native SMJ/SHJ output.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronSQLTestHelper.scala Adds withSparkConf helper to temporarily set SparkConf/SparkEnv conf in tests.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala Adds tests asserting native SMJ/SHJ presence for inner joins with residual predicates.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q95.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q92.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q85.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q81.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q72.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q68.txt Updates golden plan to include NativeFilter/NativeProject changes.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q65.txt Updates golden plan to include NativeFilter/NativeProject changes.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q64.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q6.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q48.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q46.txt Updates golden plan to include NativeFilter/NativeProject changes.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q32.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q30.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q19.txt Updates golden plan to include NativeFilter/NativeProject changes.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q15.txt Updates golden plan to include NativeFilter above native join.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q13.txt Updates golden plan to include NativeFilter/NativeProject changes.
dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q1.txt Updates golden plan to include NativeFilter above native join.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@yew1eb
Copy link
Copy Markdown
Contributor

yew1eb commented Apr 28, 2026

@weimingdiit Thanks for the contribution! Clean approach and a solid perf win for a common pattern.
One minor suggestion: could we also verify query result correctness in the UTs, not just plan? Otherwise LGTM.

@weimingdiit
Copy link
Copy Markdown
Contributor Author

weimingdiit commented Apr 28, 2026

@weimingdiit Thanks for the contribution! Clean approach and a solid perf win for a common pattern. One minor suggestion: could we also verify query result correctness in the UTs, not just plan? Otherwise LGTM.

@yew1eb Thanks a lot for the review and the suggestion! I believe result correctness is already covered by checkSparkAnswerAndOperator here. That helper first runs the query with Auron disabled to get the vanilla Spark result, then runs the same query with Auron enabled and compares the two results with checkAnswer.

The additional assertions in these tests are only meant to ensure that the intended native SMJ/SHJ paths are exercised. That said, I’m happy to add explicit expected rows as well if you think that would make the tests clearer.

@yew1eb
Copy link
Copy Markdown
Contributor

yew1eb commented Apr 28, 2026

@weimingdiit Ah, you're right --I completely missed that checkSparkAnswerAndOperator already handles result verification. Thanks for the clarification! LGTM.

@slfan1989
Copy link
Copy Markdown
Contributor

@cxzl25 Could you please take another look at this PR? I believe it meets the expectations. Thank you very much!

@slfan1989 slfan1989 requested a review from cxzl25 April 28, 2026 07:20
condition match {
case Some(residualCondition) =>
assert(joinType.isInstanceOf[InnerLike], "join condition is not supported")
convertFilterExec(FilterExec(residualCondition, joined))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this processing will not fallback to the Spark implementation, the filtering efficiency does not seem to be as good as Spark. Spark implement the filter during the Join process.

@yew1eb
Copy link
Copy Markdown
Contributor

yew1eb commented Apr 28, 2026

@weimingdiit @cxzl25 DataFusion Comet tackles this differently in apache/datafusion-comet#553 — they push the residual filter directly into the native join operator rather than adding a native filter above. This avoids the post-join filter overhead, though it looks like it requires a DataFusion version bump. Might be worth weighing this trade-off against the current approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for inner residual join conditions on SMJ/SHJ

5 participants