Add torch_compile flag for training networks by wenxin0319 · Pull Request #28 · NVlabs/FastGen

wenxin0319 · 2026-06-01T05:12:25Z

FastGen currently relies on diffusers-based model execution, which leaves performance on the table during training.

This PR adds an opt-in torch_compile flag that wraps training networks with torch.compile, enabling PyTorch's compiler optimizations (operator fusion, memory planning, kernel autotuning) for significant speedups on common models.

Benchmark (QwenImage, 20.43B params, NVIDIA H100, bfloat16, 512x512):

Setting │ Time/iter │ Std
Baseline (no compile) │ 0.694s │ 0.094s
torch.compile (max-autotune) │ 0.447s │ 0.014s

which is Speedup 1.55x (55% faster)

Compiled iterations also show much lower variance (0.014s vs 0.094s), meaning more consistent training throughput. The one-time compilation overhead (~5-10 min with max-autotune) is amortized over the full training run.

Changes:

Add torch_compile: bool = False config option in BaseModelConfig
Add _apply_torch_compile() in FastGenModel that compiles the main network (self.net)
Override _apply_torch_compile() in DMD2Model to also compile teacher and fake_score networks
Add comprehensive tests covering compile on/off for both SFT and DMD2 models, including training step validation
Add bench_compile.py benchmark script for measuring compile speedup

Usage:
Set torch_compile=True in model config to enable.

wenxin0319 · 2026-06-01T05:16:14Z

@juliusberner Could you please take a look at my PR? Thanks!

juliusberner

Thanks a lot for the PR and the benchmarking, I left a few comments!

juliusberner · 2026-06-07T15:37:28Z

    ddp_find_unused_parameters: bool = True

+    # enable torch.compile for training networks
+    torch_compile: bool = False


Can we make this more general, e.g.:

# torch.compile mode for inference speedup ("default", "reduce-overhead", "max-autotune") # None disables torch.compile. torch_compile_mode: Optional[str] = None

Thank you, I will do that

juliusberner · 2026-06-07T15:42:28Z

        # instantiate all necessary nets and submodules
        self.build_model()

+        # optionally compile networks with torch.compile


I think compilation should happen after the FSDP/DDP wrapping

juliusberner · 2026-06-07T15:49:13Z

            synchronize()
        torch.cuda.empty_cache()

+    def _apply_torch_compile(self):


Can we define a compile_dict similar to the fsdp_dict/model_dict, that contains all modules that should be compiled? We could also add the VAE there (note that we would need to search for submodules of the VAE, since the VAEs themselves are not instances of torch.nn.module).

- Replace torch_compile: bool with torch_compile_mode: Optional[str] (e.g. "default", "reduce-overhead", "max-autotune"; None disables). - Introduce a compile_dict property (like fsdp_dict/model_dict) holding all modules to compile. DMD2 extends it with teacher/fake_score. The base also includes the VAE: VAE wrappers are not nn.Modules, so we search their attributes for the underlying nn.Module submodule(s). - Compile in place via nn.Module.compile() and apply it from the trainer *after* DDP/FSDP wrapping so torch.compile composes with the wrappers. - Update tests accordingly (in-place compile detection, compile_dict and VAE submodule discovery).

Add torch_compile flag for training networks

e9863fe

juliusberner suggested changes Jun 7, 2026

View reviewed changes

wenxin0319 force-pushed the main branch 2 times, most recently from f930ee8 to eb763ac Compare June 9, 2026 03:49

wenxin0319 force-pushed the main branch from eb763ac to 78e9f0d Compare June 9, 2026 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add torch_compile flag for training networks#28

Add torch_compile flag for training networks#28
wenxin0319 wants to merge 2 commits into
NVlabs:mainfrom
wenxin0319:main

wenxin0319 commented Jun 1, 2026

Uh oh!

wenxin0319 commented Jun 1, 2026

Uh oh!

juliusberner left a comment •

edited

Loading

Uh oh!

juliusberner Jun 7, 2026

Uh oh!

wenxin0319 Jun 9, 2026

Uh oh!

juliusberner Jun 7, 2026

Uh oh!

juliusberner Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wenxin0319 commented Jun 1, 2026

Uh oh!

wenxin0319 commented Jun 1, 2026

Uh oh!

juliusberner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusberner Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

wenxin0319 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

juliusberner Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

juliusberner Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juliusberner left a comment •

edited

Loading