Chore: refactor pt-expt compile by anyangml · Pull Request #5504 · deepmodeling/deepmd-kit

anyangml · 2026-06-08T03:16:20Z

Summary by CodeRabbit

Refactor
- Moved compile/tracing helpers into a shared utilities module to reduce duplication.
- Training trace path now uses shared helpers and selects a “safe prime” frame size, padding only necessary dimensions.
Bug Fixes
- Improved FX detach cleanup and graph rebuilding for more stable tracing and compilation (fewer symbolic-shape and re-trace failures).

for more information, see https://pre-commit.ci

coderabbitai · 2026-06-08T03:17:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new shared compile_utils module (prime helpers, trace-time padding, FX detach-strip, FX rebuild) and updates SeZMModel and training trace/compile to import and use these helpers; training now selects a forbidden-set-aware safe-prime trace frame size and pads frame-only inputs before make_fx and compile.

Changes

Compile Utilities and Shape Specialization

Layer / File(s)	Summary
Compile utilities library `deepmd/pt/utils/compile_utils.py`	New module provides prime helpers (`_is_prime`, `_next_safe_prime` with forbidden-set filtering), tensor padding/trimming (`_trace_pad_dim`), FX graph topology-aware detach-chain removal (`strip_saved_tensor_detach`), and stale-pointer-safe graph rebuilding (`rebuild_graph_module`).
SeZMModel compile-utils integration `deepmd/pt/model/model/sezm_model.py`	Replaces in-file compile helper implementations with imports from `compile_utils`; removes duplicate prime selection, tensor padding, detach-chain stripping, and graph-rebuild code.
Training pipeline prime-based shape specialization `deepmd/pt_expt/train/training.py`	Imports compile utilities and replaces fixed `nframes=7` trace padding with dynamic safe-prime-based frame-dimension selection using a forbidden-set derived from model params/buffers and runtime dims; pads/clamps trace inputs via `_trace_pad_dim`, then uses `strip_saved_tensor_detach` and `rebuild_graph_module` before `torch.compile`.

Sequence Diagram (trace & compile flow)

sequenceDiagram
  participant Trainer
  participant make_fx
  participant DetachFix as strip_saved_tensor_detach
  participant Rebuilder as rebuild_graph_module
  participant Compiler as torch.compile

  Trainer->>make_fx: call make_fx(forward_lower) with padded inputs
  make_fx->>DetachFix: produce traced_lower GraphModule
  DetachFix->>Rebuilder: remove saved-tensor detaches and rewrite uses
  Rebuilder->>Compiler: rebuild graph and return clean GraphModule
  Compiler->>Trainer: compile(GraphModule, dynamic=True, backend="inductor")

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

deepmodeling/deepmd-kit#5483: Also touches SeZM/tracing machinery and safe-prime trace-shape selection/padding helpers used during make_fx/compile.
deepmodeling/deepmd-kit#5457: Modifies the training trace_and_compile/_CompiledModel paths; overlaps in the compile/tracing area and compiled forward_lower handling.

Suggested reviewers

wanghan-iapcm
njzjz-bot
OutisLi

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately reflects the main change: refactoring compilation utilities in the pt-expt module by extracting shared helpers into a common utils file.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

deepmd/pt/utils/compile_utils.py (1)

87-88: 💤 Low value

Consider adding a type guard for extra robustness.

While aten.detach.default inputs are always Nodes in make_fx-generated graphs, adding an isinstance check would prevent potential AttributeError if this helper is ever called on malformed graphs.

🛡️ Optional defensive check

     def _is_detach(n: torch.fx.Node) -> bool:
-        return n.op == "call_function" and n.target == _DETACH
+        return isinstance(n, torch.fx.Node) and n.op == "call_function" and n.target == _DETACH

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/utils/compile_utils.py` around lines 87 - 88, Update _is_detach to
defensively check that the input is a torch.fx.Node before accessing attributes:
change the parameter type to a more permissive Any (or keep current) and add an
isinstance(n, torch.fx.Node) guard so you only evaluate n.op and n.target when n
is a Node; optionally change the return annotation to
typing.TypeGuard[torch.fx.Node] if you want an actual type guard. Ensure the
function still checks n.op == "call_function" and n.target == _DETACH after the
isinstance check and reference the _is_detach name, torch.fx.Node, and _DETACH
in your change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@deepmd/pt/utils/compile_utils.py`:
- Around line 87-88: Update _is_detach to defensively check that the input is a
torch.fx.Node before accessing attributes: change the parameter type to a more
permissive Any (or keep current) and add an isinstance(n, torch.fx.Node) guard
so you only evaluate n.op and n.target when n is a Node; optionally change the
return annotation to typing.TypeGuard[torch.fx.Node] if you want an actual type
guard. Ensure the function still checks n.op == "call_function" and n.target ==
_DETACH after the isinstance check and reference the _is_detach name,
torch.fx.Node, and _DETACH in your change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4c2615d5-fadf-4031-a417-656104bcad39

📥 Commits

Reviewing files that changed from the base of the PR and between 99c1ece and 4b314b0.

📒 Files selected for processing (3)

deepmd/pt/model/model/sezm_model.py
deepmd/pt/utils/compile_utils.py
deepmd/pt_expt/train/training.py

Copilot

Pull request overview

This PR refactors PyTorch tracing/compilation helper logic into a shared utility module so both the SeZM model compile path and the pt_expt training compile path reuse the same implementations.

Changes:

Added deepmd/pt/utils/compile_utils.py with shared helpers for trace-time prime shape selection, input padding/trimming, detach-chain stripping, and FX graph rebuilding.
Updated deepmd/pt/model/model/sezm_model.py to import and use the shared helpers, removing the previous in-file implementations.
Updated deepmd/pt_expt/train/training.py to import and use the shared helpers and to coerce multiple trace-time dimensions (nf/nloc/nall) to collision-resistant primes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
deepmd/pt/utils/compile_utils.py	New shared tracing/compile utility module (prime sizing, padding, detach stripping, graph rebuilding).
deepmd/pt/model/model/sezm_model.py	Replaced local helper implementations with imports from the new shared module.
deepmd/pt_expt/train/training.py	Replaced local FX graph post-processing helpers with shared imports; updated trace-input coercion to prime sizes for multiple dims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-06-08T04:13:26Z

Codecov Report

❌ Patch coverage is 87.09677% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.52%. Comparing base (890e38a) to head (530d1c5).

Files with missing lines	Patch %	Lines
deepmd/pt_expt/train/training.py	86.20%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5504      +/-   ##
==========================================
- Coverage   81.52%   81.52%   -0.01%     
==========================================
  Files         872      872              
  Lines       97964    97967       +3     
  Branches     4241     4240       -1     
==========================================
  Hits        79865    79865              
- Misses      16795    16801       +6     
+ Partials     1304     1301       -3

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Line 395: The comment in training.py contains EN DASH characters in the phrase
"50–500 and 200–5000+" which triggers Ruff RUF003; update that comment (around
the block where the line mentions real data counts) to use HYPHEN-MINUS instead,
i.e. change "50–500 and 200–5000+" to "50-500 and 200-5000+", ensuring no other
EN DASH characters remain in the same comment or nearby comments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d06cb1c3-6798-4545-9084-600cf0f7aaab

📥 Commits

Reviewing files that changed from the base of the PR and between c0b66b7 and c7f9f7f.

📒 Files selected for processing (1)

deepmd/pt_expt/train/training.py

Resolve conflicts from PR deepmodeling#5503 (dpa4) which introduced deepmd/pt/utils/compile_compat.py — a superset of this branch's compile_utils.py. Consolidate onto compile_compat: - sezm_model.py: take upstream import block + return paren - training.py: import next_safe_prime/trace_pad_dim/rebuild_graph_module/ strip_saved_tensor_detach from compile_compat (aliased to local names) - remove redundant compile_utils.py Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

…move_all) The merge consolidated training.py onto compile_compat.strip_saved_tensor_detach, which is *selective* — it preserves user-explicit .detach() calls. The traced training fn opens with `coord.detach().requires_grad_(True)`, so the selective strip left that boundary detach in place, severing the second-order gradient path and producing the compiled-vs-uncompiled force mismatch (DPA2 test). Rather than duplicate an aggressive remover, add a keyword-only `remove_all` flag to the single compile_compat.strip_saved_tensor_detach: - SeZM inference path (sezm_model.py): default remove_all=False -> selective, preserving legitimate user .detach() calls (dpa4 behaviour). - pt_expt training path (training.py): remove_all=True -> strip every detach, correct because the trace is fed already-detached, grad-enabled inputs. One shared implementation, behaviour selected per call site. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: refactor common utils

8d9501b

Copilot AI review requested due to automatic review settings June 8, 2026 03:16

github-actions Bot added the Python label Jun 8, 2026

Copilot started reviewing on behalf of anyangml June 8, 2026 03:16 View session

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b314b0

for more information, see https://pre-commit.ci

github-advanced-security AI found potential problems Jun 8, 2026

View reviewed changes

Comment thread deepmd/pt_expt/train/training.py Dismissed

Comment thread deepmd/pt_expt/train/training.py Fixed

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Copilot AI reviewed Jun 8, 2026

View reviewed changes

fix: add more dim to forbidden

c0b66b7

github-advanced-security AI found potential problems Jun 8, 2026

View reviewed changes

Comment thread deepmd/pt_expt/train/training.py Dismissed

anyangml and others added 2 commits June 8, 2026 13:53

fix: UT

c7f9f7f

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1858fe

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread deepmd/pt_expt/train/training.py Outdated

anyangml added 3 commits June 8, 2026 14:06

chore:lint

4f5c24f

fix:UT and lazy compile

d8f9cd0

fix: detach

710fe73

anyangml requested review from OutisLi, iProzd and wanghan-iapcm June 8, 2026 09:33

iProzd approved these changes Jun 9, 2026

View reviewed changes

wanghan-iapcm reviewed Jun 10, 2026

View reviewed changes

Comment thread deepmd/pt/utils/compile_utils.py Outdated

anyangml and others added 3 commits June 11, 2026 10:52

[pre-commit.ci] auto fixes from pre-commit.com hooks

44c7422

for more information, see https://pre-commit.ci

anyangml requested a review from wanghan-iapcm June 11, 2026 06:05

anyangml changed the title ~~Chore: refactor common utils~~ Chore: refactor pt-expt compile Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore: refactor pt-expt compile#5504

Chore: refactor pt-expt compile#5504
anyangml wants to merge 11 commits into
deepmodeling:masterfrom
anyangml:chore/refactor-compile

anyangml commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram (trace & compile flow)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

anyangml commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram (trace & compile flow)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anyangml commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading