Skip to content

[codex] Use PR41015 fix setup for GB200 MTP2 high throughput#1273

Open
alec-flowers wants to merge 1 commit intomainfrom
codex/gb200-vllm-pr41015-fix
Open

[codex] Use PR41015 fix setup for GB200 MTP2 high throughput#1273
alec-flowers wants to merge 1 commit intomainfrom
codex/gb200-vllm-pr41015-fix

Conversation

@alec-flowers
Copy link
Copy Markdown
Collaborator

Summary

  • Point the GB200 DeepSeek-V4-Pro MTP2 high-throughput srt-slurm recipe at vllm-container-deps-pr41015-fp4-fix.sh.
  • Keep the change limited to the c1024 high-throughput MTP2 recipe; the patch script itself is expected to come from the checked-out aflowers/vllm-gb200-v0.20.0 srt-slurm branch.

Why

This lets the InferenceX rerun use the srt-slurm PR41015 FP4 revert wrapper that recovered the GB200 MTP2 throughput regression in SA testing.

Validation

  • Parsed the edited YAML and confirmed setup_script resolves to vllm-container-deps-pr41015-fp4-fix.sh.
  • Confirmed the target srt-slurm branch contains configs/patches/vllm-container-deps-pr41015-fp4-fix.sh and configs/patches/vllm_revert_pr41015_fp4_cvt.py.
  • Ran bash -n on the srt-slurm wrapper and python3 -m py_compile on the patch helper.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@alec-flowers alec-flowers marked this pull request as ready for review May 4, 2026 05:03
@alec-flowers alec-flowers requested a review from a team May 4, 2026 05:03
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Claude finished @alec-flowers's task in 41s —— View job


Review

  • Gather context and read changed files
  • Analyze the diff against origin/main
  • Post review feedback

LGTM — no blocking issues found.

The change is a single-line swap of setup_script from vllm-container-deps.sh to vllm-container-deps-pr41015-fp4-fix.sh in the GB200 MTP2 high-throughput recipe. The rest of the YAML is well-formed: MTP benchmark correctly includes use_chat_template: true, the container image is publicly accessible on Docker Hub, and no master config files were touched.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — single-line YAML setup_script swap for the GB200 MTP2 high-throughput benchmark recipe.

Extended reasoning...

Overview

The PR makes a one-line change to a single benchmark recipe YAML (benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-high-tpt-megamoe-mtp2.yaml), changing setup_script from vllm-container-deps.sh to vllm-container-deps-pr41015-fp4-fix.sh so this recipe picks up the PR41015 FP4 revert wrapper that recovered GB200 MTP2 throughput.

Security risks

None. The change is a benchmarking config knob; no auth, crypto, network exposure, or production code paths are touched.

Level of scrutiny

Low. This is a benchmark recipe in benchmarks/multi_node/srt-slurm-recipes/ that is scoped to a single GB200 MTP2 high-throughput run. The referenced patch script lives in an external srt-slurm branch (aflowers/vllm-gb200-v0.20.0), and the PR description confirms its existence and that it parses cleanly. No other recipes are affected.

Other factors

Bug hunting found nothing. The change follows the existing pattern of pointing recipes at a setup script by name, mirrors recent commits in this area (e.g. 07df028, 0f630e1), and is trivially reversible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant