[SPARK-56499][CORE] Deduplicate RDD graph BFS traversal pattern in DAGScheduler by jiangxb1987 · Pull Request #55361 · apache/spark

jiangxb1987 · 2026-04-15T23:30:10Z

What changes were proposed in this pull request?

Introduced two private helper methods in DAGScheduler:
- traverseRDDGraph(rdd)(visitor) — traverses the RDD dependency graph using an explicit stack, calling visitor(rdd, enqueue) for each unvisited RDD.
- traverseRDDGraphUntil(rdd)(visitor) — like above but supports early termination: the visitor returns false to stop traversal; returns whether traversal completed.
Refactored 6 methods to use these helpers, eliminating duplicated visited + waitingForVisit boilerplate:
- getMissingAncestorShuffleDependencies
- getShuffleDependenciesAndResourceProfiles
- traverseParentRDDsWithinStage
- getMissingParentStages
- eagerlyComputePartitionsForRddAndAncestors
- stageDependsOn

Why are the changes needed?

The same BFS traversal pattern (maintain a visited set and a waitingForVisit stack, remove-and-check-visited in a loop) was duplicated across many methods in DAGScheduler, adding unnecessary boilerplate and making the code harder to read and maintain.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Covered by existing DAGSchedulerSuite tests. No behavior changes were made.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (claude-sonnet-4-6)

…GScheduler

Ngone51

Nice cleanup! LGTM except one minior comment.

Ngone51 · 2026-04-16T03:16:16Z

-    val ancestors = new ListBuffer[ShuffleDependency[_, _, _]]
+  /**
+   * Traverses the RDD dependency graph using a manually maintained stack to prevent
+   * StackOverflowError caused by recursive traversal. For each unvisited RDD, calls


Maybe we should move the comment "Traverses the RDD dependency graph using a manually maintained stack to prevent StackOverflowError caused by recursive traversal." to traverseRDDGraphUntil().

…raphUntil

lakechd · 2026-04-16T20:32:53Z

+          true
+      }
+      if (rddHasUncachedPartitions) {
+        for (dep <- rdd.dependencies) {


it will loop over all the dependencies, regardless the path has been visited before

This matches the original behavior, I don't think we should change it in this PR.

cloud-fan

Clean refactor: introduces two local helpers (traverseRDDGraph / traverseRDDGraphUntil) that factor out the iterative-stack traversal pattern duplicated across six DAGScheduler methods. Traced each call site — semantics are preserved, including the visited-set discipline and the final return value of traverseParentRDDsWithinStage. stageDependsOn additionally gains early termination (a perf improvement over walking the full reachable set). All changes are private to DAGScheduler; no public-API surface change; existing DAGSchedulerSuite coverage is appropriate.

LGTM with two minor suggestions inline.

cloud-fan · 2026-04-17T04:34:26Z

+    true
+  }
+
+  /** Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet */


The field is shuffleIdToMapStage (line 160). Since this comment is re-added in the diff, worth fixing the stale name while we're here.

Suggested change

/** Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet */

/** Find ancestor shuffle dependencies that are not registered in shuffleIdToMapStage yet */

cloud-fan · 2026-04-17T04:34:26Z

-    def visit(rdd: RDD[_]): Unit = {
-      if (!visitedRdds(rdd)) {
-        visitedRdds += rdd
+    var found = false


Minor: traverseRDDGraphUntil already returns false iff the visitor terminated early, so we can drop the var found and return the negation of the helper directly — matching the shape of traverseParentRDDsWithinStage which returns the helper's result directly:

!traverseRDDGraphUntil(stage.rdd) { (rdd, enqueue) => if (rdd == target.rdd) { false } else { for (dep <- rdd.dependencies) { dep match { case shufDep: ShuffleDependency[_, _, _] => val mapStage = getOrCreateShuffleMapStage(shufDep, stage.firstJobId) if (!mapStage.isAvailable) { enqueue(mapStage.rdd) } // Otherwise there's no need to follow the dependency back case narrowDep: NarrowDependency[_] => enqueue(narrowDep.rdd) } } true } }

[SPARK-56499][CORE] Deduplicate RDD graph BFS traversal pattern in DA…

4e9c84c

…GScheduler

Ngone51 approved these changes Apr 16, 2026

View reviewed changes

Address review comment: move stack management comment to traverseRDDG…

3cfd887

…raphUntil

lakechd reviewed Apr 16, 2026

View reviewed changes

cloud-fan approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56499][CORE] Deduplicate RDD graph BFS traversal pattern in DAGScheduler#55361

[SPARK-56499][CORE] Deduplicate RDD graph BFS traversal pattern in DAGScheduler#55361
jiangxb1987 wants to merge 2 commits intoapache:masterfrom
jiangxb1987:SPARK-56499

jiangxb1987 commented Apr 15, 2026

Uh oh!

Ngone51 left a comment

Uh oh!

Ngone51 Apr 16, 2026

Uh oh!

jiangxb1987 Apr 16, 2026

Uh oh!

lakechd Apr 16, 2026

Uh oh!

jiangxb1987 Apr 16, 2026

Uh oh!

cloud-fan left a comment

Uh oh!

cloud-fan Apr 17, 2026

Uh oh!

cloud-fan Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	/** Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet */
	/** Find ancestor shuffle dependencies that are not registered in shuffleIdToMapStage yet */

Conversation

jiangxb1987 commented Apr 15, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Ngone51 left a comment

Choose a reason for hiding this comment

Uh oh!

Ngone51 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

lakechd Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants