[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image#1270
[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image#1270Oseltamivir wants to merge 3 commits intomainfrom
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| description: | ||
| - "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9e" | ||
| - "Enable TRTLLM fused MHC by default now that the image includes the hidden-size 7168 fused-HC fix" | ||
| pr-link: TBD |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry uses 'pr-link: TBD' instead of a real URL. Every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL, and the PR number (1270) is already known — please replace TBD with #1270 before merge. Validation accepts any string so CI passes, but the placeholder degrades traceability for downstream tooling.
Extended reasoning...
What this is: The new perf-changelog.yaml entry added at line 2157 has pr-link: TBD as a placeholder rather than a real PR URL.
Why it stands out: A grep confirms this is the only pr-link: TBD in the entire perf-changelog.yaml — every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL. The PR number is already known (#1270), so this is a straightforward fill-in. AGENTS.md documents the convention of using a placeholder while the PR is being prepared and replacing it with the real URL once the PR exists.
Why it does not break CI: utils/matrix_logic/validation.py:483 declares pr_link: str = Field(alias="pr-link") — a plain string with no URL pattern validation. process_changelog.py imports ChangelogEntry but never references the pr_link field itself for sweep generation, so nothing breaks at runtime. That is why this is a nit (traceability/process oversight) rather than a normal bug.
Impact: Downstream tooling and humans reading perf-changelog.yaml lose the link back to the PR that introduced this image bump. Anyone investigating a regression on dsv4-fp4-b300-trt to the fix-mhc7168-eb20e9e image (or trying to understand why fused MHC was re-enabled by default) cannot click through to the discussion and diff.
Step-by-step proof:
- Open
perf-changelog.yamlline 2157: the new entry's last line ispr-link: TBD. - Search the file for other entries — all use
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/N(e.g. line 2151 immediately above ispr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233). - Confirm the PR number: this PR is [DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image #1270 (per the PR metadata).
- Confirm CI won't fail:
utils/matrix_logic/validation.py:483definespr_linkas a plainstr, with noHttpUrlor regex constraint.
Fix: Replace pr-link: TBD on line 2157 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1270 before merge.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25270557269 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25271051983 |
blockers on review/merge NVIDIA/TensorRT-LLM#13710
Summary
dsv4-fp4-b300-trtto the TensorRT-LLM image with the fused MHC hidden-size 7168 fix.TensorRT-LLM fork
ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9eValidation
bash -n benchmarks/single_node/dsv4_fp4_b300_trt.sh.github/configs/nvidia-master.yamlandperf-changelog.yamlpython3 utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-trt --config-files .github/configs/nvidia-master.yaml