Skip to content

[AMD] Update Dsv4 mi355x envs#1272

Merged
chunfangamd merged 5 commits intomainfrom
dsv4_mi355x_opt_envs2
May 4, 2026
Merged

[AMD] Update Dsv4 mi355x envs#1272
chunfangamd merged 5 commits intomainfrom
dsv4_mi355x_opt_envs2

Conversation

@benenzhu
Copy link
Copy Markdown
Collaborator

@benenzhu benenzhu commented May 4, 2026

Update to run on some fusions

Have tested throughtput and accuracy locally.

Co-authored-by: @chunfangamd

Align serving capacity with each concurrency point and expand low-concurrency coverage for DP attention variants.

Co-authored-by: Chun Fang <chun.fang@amd.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Comment thread perf-changelog.yaml Outdated
- "Tune DSv4 FP4 MI355X SGLang runtime envs: enable fused compress and TileLang MHC post, and drop the Torch fallback env overrides for top-k transform and FP8 paged MQA logits"
- "Drive SGLang --max-running-requests and --cuda-graph-max-bs from the matrix CONC value instead of hard-coding 256, so each sweep point launches with matching serving capacity and graph capture size"
- "Adjust the sweep coverage: start 1k1k dp-attn=true at conc=16, add 1k1k dp-attn=false conc=1-32, and extend 8k1k dp-attn=false coverage to conc=32"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry's pr-link is set to https://github.com/SemiAnalysisAI/InferenceX/pull/xxx — the xxx placeholder (per AGENTS.md / .github/workflows/claude.yml convention) was never substituted with this PR's number. Every other entry in the file uses a real PR number (e.g., the entry directly above uses /pull/1233 and the one below uses /pull/1266); merging as-is leaves a dead link that defeats the traceability purpose of the field. Fix by replacing pull/xxx with pull/1272.

Extended reasoning...

What the bug is

In the diff to perf-changelog.yaml, a new entry is appended for dsv4-fp4-mi355x-sglang whose final field reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx

The literal token xxx is the placeholder convention documented in AGENTS.md and .github/workflows/claude.yml — authors are expected to substitute it with the real PR number once the PR is opened. That substitution did not happen here, so the changelog entry will be merged with an unresolvable link.

How it manifests

Anyone clicking the pr-link from this entry — or any tooling that follows changelog entries back to their authoring PR (release notes generators, perf regression bisects, audit dashboards) — will hit GitHub's 404 page for /SemiAnalysisAI/InferenceX/pull/xxx instead of arriving at PR #1272. This breaks the round-trip from observed perf change → originating PR, which is the entire reason the pr-link field exists.

Why surrounding code doesn't catch it

perf-changelog.yaml is a free-form YAML document — there is no schema validator that would reject a string-shaped link, and no CI step appears to run a 'resolve all changelog URLs' check. Nothing else in the diff would block a merge with the placeholder still in place.

Step-by-step proof

  1. The PR's metadata identifies it as PR [AMD] Update Dsv4 mi355x envs #1272 ().
  2. The diff at perf-changelog.yaml:2171 adds the line pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx.
  3. Compare to the immediately preceding entry (line 2163 in the same file) which reads pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233, and the entry directly below at line 2180 /pull/1266 — i.e., every other entry uses a real, resolvable PR number.
  4. Visiting https://github.com/SemiAnalysisAI/InferenceX/pull/xxx yields a 404, while https://github.com/SemiAnalysisAI/InferenceX/pull/1272 correctly resolves to this PR.
  5. AGENTS.md (lines 281, 304) and .github/workflows/claude.yml (line 237) document XXX / xxx as a placeholder to be replaced before merging.

Impact

Doc-only / traceability — no runtime behavior is affected. But the field is the single source of truth that links a perf-history row back to its authoring PR, so the dead link silently rots that traceability for this entry forever once merged.

How to fix

Change pull/xxx to pull/1272 on line 2171 of perf-changelog.yaml. One-character swap, no other changes required.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@functionstackx
Copy link
Copy Markdown
Contributor

+viz @Oseltamivir for review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Co-authored-by: thomawan <thomawan@amd.com>
@benenzhu benenzhu force-pushed the dsv4_mi355x_opt_envs2 branch from 95e7678 to e262c4d Compare May 4, 2026 09:42
Copy link
Copy Markdown
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chunfangamd chunfangamd enabled auto-merge (squash) May 4, 2026 10:01
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chunfangamd chunfangamd merged commit 6b66133 into main May 4, 2026
20 checks passed
@chunfangamd chunfangamd deleted the dsv4_mi355x_opt_envs2 branch May 4, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants