[AURON #2193] Implement native support for inner residual join conditions on SMJ/SHJ#2197
Conversation
…conditions on SMJ/SHJ Signed-off-by: weimingdiit <weimingdiit@gmail.com>
353dd8d to
e02b0bc
Compare
Signed-off-by: weimingdiit <weimingdiit@gmail.com>
e02b0bc to
1538724
Compare
There was a problem hiding this comment.
Pull request overview
Enables native conversion of Spark SMJ/SHJ inner joins that have an additional residual (non-equi) predicate by keeping the native join keyed-only and evaluating the residual predicate via a native Filter above the join.
Changes:
- Allow native SMJ/SHJ conversion for
InnerLikejoins with a residualcondition, applying it as a native filter above the join. - Add query tests covering SMJ/SHJ inner residual conditions (including force-SHJ mode) and a SparkConf test helper.
- Update TPCDS plan-stability golden files to reflect the new
NativeFilter/NativeProjectshapes.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala | Wrap inner residual join conditions with a native filter above native SMJ/SHJ output. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronSQLTestHelper.scala | Adds withSparkConf helper to temporarily set SparkConf/SparkEnv conf in tests. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala | Adds tests asserting native SMJ/SHJ presence for inner joins with residual predicates. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q95.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q92.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q85.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q81.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q72.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q68.txt | Updates golden plan to include NativeFilter/NativeProject changes. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q65.txt | Updates golden plan to include NativeFilter/NativeProject changes. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q64.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q6.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q48.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q46.txt | Updates golden plan to include NativeFilter/NativeProject changes. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q32.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q30.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q19.txt | Updates golden plan to include NativeFilter/NativeProject changes. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q15.txt | Updates golden plan to include NativeFilter above native join. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q13.txt | Updates golden plan to include NativeFilter/NativeProject changes. |
| dev/auron-it/src/main/resources/tpcds-plan-stability/spark-3.5/q1.txt | Updates golden plan to include NativeFilter above native join. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@weimingdiit Thanks for the contribution! Clean approach and a solid perf win for a common pattern. |
@yew1eb Thanks a lot for the review and the suggestion! I believe result correctness is already covered by checkSparkAnswerAndOperator here. That helper first runs the query with Auron disabled to get the vanilla Spark result, then runs the same query with Auron enabled and compares the two results with checkAnswer. The additional assertions in these tests are only meant to ensure that the intended native SMJ/SHJ paths are exercised. That said, I’m happy to add explicit expected rows as well if you think that would make the tests clearer. |
|
@weimingdiit Ah, you're right --I completely missed that |
|
@cxzl25 Could you please take another look at this PR? I believe it meets the expectations. Thank you very much! |
| condition match { | ||
| case Some(residualCondition) => | ||
| assert(joinType.isInstanceOf[InnerLike], "join condition is not supported") | ||
| convertFilterExec(FilterExec(residualCondition, joined)) |
There was a problem hiding this comment.
Although this processing will not fallback to the Spark implementation, the filtering efficiency does not seem to be as good as Spark. Spark implement the filter during the Join process.
|
@weimingdiit @cxzl25 DataFusion Comet tackles this differently in apache/datafusion-comet#553 — they push the residual filter directly into the native join operator rather than adding a native filter above. This avoids the post-join filter overhead, though it looks like it requires a DataFusion version bump. Might be worth weighing this trade-off against the current approach. |
Which issue does this PR close?
Closes #2193
Rationale for this change
Auron currently only converts SMJ/SHJ joins when the join condition is empty. As a result, inner joins that contain both equi-join keys and a residual predicate fall back to Spark even though the equi-join part is already native-compatible.
The native join plan only models equi-join keys today, so this change keeps the native join focused on the equi-join portion and evaluates the residual predicate with a native filter above the join output.
What changes are included in this PR?
Are there any user-facing changes?
Yes. Inner joins with equi-join keys plus a residual predicate can now remain on the native SMJ/SHJ path instead of falling back entirely to Spark.
How was this patch tested?
CI.