[AMD] Update Dsv4 mi355x envs#1272
Conversation
Align serving capacity with each concurrency point and expand low-concurrency coverage for DP attention variants. Co-authored-by: Chun Fang <chun.fang@amd.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25301672247 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25301672662 |
| - "Tune DSv4 FP4 MI355X SGLang runtime envs: enable fused compress and TileLang MHC post, and drop the Torch fallback env overrides for top-k transform and FP8 paged MQA logits" | ||
| - "Drive SGLang --max-running-requests and --cuda-graph-max-bs from the matrix CONC value instead of hard-coding 256, so each sweep point launches with matching serving capacity and graph capture size" | ||
| - "Adjust the sweep coverage: start 1k1k dp-attn=true at conc=16, add 1k1k dp-attn=false conc=1-32, and extend 8k1k dp-attn=false coverage to conc=32" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry's pr-link is set to https://github.com/SemiAnalysisAI/InferenceX/pull/xxx — the xxx placeholder (per AGENTS.md / .github/workflows/claude.yml convention) was never substituted with this PR's number. Every other entry in the file uses a real PR number (e.g., the entry directly above uses /pull/1233 and the one below uses /pull/1266); merging as-is leaves a dead link that defeats the traceability purpose of the field. Fix by replacing pull/xxx with pull/1272.
Extended reasoning...
What the bug is
In the diff to perf-changelog.yaml, a new entry is appended for dsv4-fp4-mi355x-sglang whose final field reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxxThe literal token xxx is the placeholder convention documented in AGENTS.md and .github/workflows/claude.yml — authors are expected to substitute it with the real PR number once the PR is opened. That substitution did not happen here, so the changelog entry will be merged with an unresolvable link.
How it manifests
Anyone clicking the pr-link from this entry — or any tooling that follows changelog entries back to their authoring PR (release notes generators, perf regression bisects, audit dashboards) — will hit GitHub's 404 page for /SemiAnalysisAI/InferenceX/pull/xxx instead of arriving at PR #1272. This breaks the round-trip from observed perf change → originating PR, which is the entire reason the pr-link field exists.
Why surrounding code doesn't catch it
perf-changelog.yaml is a free-form YAML document — there is no schema validator that would reject a string-shaped link, and no CI step appears to run a 'resolve all changelog URLs' check. Nothing else in the diff would block a merge with the placeholder still in place.
Step-by-step proof
- The PR's metadata identifies it as PR [AMD] Update Dsv4 mi355x envs #1272 ().
- The diff at
perf-changelog.yaml:2171adds the linepr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx. - Compare to the immediately preceding entry (line 2163 in the same file) which reads
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233, and the entry directly below at line 2180/pull/1266— i.e., every other entry uses a real, resolvable PR number. - Visiting
https://github.com/SemiAnalysisAI/InferenceX/pull/xxxyields a 404, whilehttps://github.com/SemiAnalysisAI/InferenceX/pull/1272correctly resolves to this PR. AGENTS.md(lines 281, 304) and.github/workflows/claude.yml(line 237) documentXXX/xxxas a placeholder to be replaced before merging.
Impact
Doc-only / traceability — no runtime behavior is affected. But the field is the single source of truth that links a perf-history row back to its authoring PR, so the dead link silently rots that traceability for this entry forever once merged.
How to fix
Change pull/xxx to pull/1272 on line 2171 of perf-changelog.yaml. One-character swap, no other changes required.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25301727154 |
|
+viz @Oseltamivir for review |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25302102877 |
Co-authored-by: thomawan <thomawan@amd.com>
95e7678 to
e262c4d
Compare
Update to run on some fusions
Have tested throughtput and accuracy locally.
Co-authored-by: @chunfangamd