[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image by Oseltamivir · Pull Request #1270 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-05-03T05:01:46Z

blockers on review/merge NVIDIA/TensorRT-LLM#13710

Summary

Update dsv4-fp4-b300-trt to the TensorRT-LLM image with the fused MHC hidden-size 7168 fix.
Re-enable TRTLLM fused MHC by default for the B300 DeepSeek-V4-Pro TRT recipe.

TensorRT-LLM fork

Branch: https://github.com/Oseltamivir/TensorRT-LLM/tree/fix/dsv4-pro-fused-mhc-hidden-7168
Commit: Oseltamivir/TensorRT-LLM@eb20e9e
Image: ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9e

Validation

bash -n benchmarks/single_node/dsv4_fp4_b300_trt.sh
YAML load for .github/configs/nvidia-master.yaml and perf-changelog.yaml
python3 utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-trt --config-files .github/configs/nvidia-master.yaml

github-actions · 2026-05-03T05:01:53Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-03T05:01:53Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-03T05:09:15Z

+  description:
+    - "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9e"
+    - "Enable TRTLLM fused MHC by default now that the image includes the hidden-size 7168 fused-HC fix"
+  pr-link: TBD


🟡 The new perf-changelog.yaml entry uses 'pr-link: TBD' instead of a real URL. Every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL, and the PR number (1270) is already known — please replace TBD with #1270 before merge. Validation accepts any string so CI passes, but the placeholder degrades traceability for downstream tooling.

Extended reasoning...

What this is: The new perf-changelog.yaml entry added at line 2157 has pr-link: TBD as a placeholder rather than a real PR URL.

Why it stands out: A grep confirms this is the only pr-link: TBD in the entire perf-changelog.yaml — every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL. The PR number is already known (#1270), so this is a straightforward fill-in. AGENTS.md documents the convention of using a placeholder while the PR is being prepared and replacing it with the real URL once the PR exists.

Why it does not break CI: utils/matrix_logic/validation.py:483 declares pr_link: str = Field(alias="pr-link") — a plain string with no URL pattern validation. process_changelog.py imports ChangelogEntry but never references the pr_link field itself for sweep generation, so nothing breaks at runtime. That is why this is a nit (traceability/process oversight) rather than a normal bug.

Impact: Downstream tooling and humans reading perf-changelog.yaml lose the link back to the PR that introduced this image bump. Anyone investigating a regression on dsv4-fp4-b300-trt to the fix-mhc7168-eb20e9e image (or trying to understand why fused MHC was re-enabled by default) cannot click through to the discussion and diff.

Step-by-step proof:

Open perf-changelog.yaml line 2157: the new entry's last line is pr-link: TBD.

Search the file for other entries — all use pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/N (e.g. line 2151 immediately above is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233).

Confirm the PR number: this PR is [DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image #1270 (per the PR metadata).

Confirm CI won't fail: utils/matrix_logic/validation.py:483 defines pr_link as a plain str, with no HttpUrl or regex constraint.

Fix: Replace pr-link: TBD on line 2157 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1270 before merge.

github-actions · 2026-05-03T05:28:35Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25270557269
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25270557269

github-actions · 2026-05-03T06:28:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25271051983
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25271051983

update dsv4 trt fused mhc image

1a3b417

Oseltamivir requested a review from a team May 3, 2026 05:01

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners May 3, 2026 05:01

github-project-automation Bot added this to InferenceMAX Board May 3, 2026

Update perf-changelog.yaml

1ce0e22

Oseltamivir added the full-sweep-enabled label May 3, 2026

claude Bot reviewed May 3, 2026

View reviewed changes

all, again

660a3b7

functionstackx changed the title ~~Update DSV4 TRT fused MHC image~~ [DNM - test PR] Update DSV4 TRT fused MHC image May 3, 2026

functionstackx changed the title ~~[DNM - test PR] Update DSV4 TRT fused MHC image~~ [DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image#1270

[DNM - test PR, waiting for nvidia/tensorrtllm to accept the fusedmhc patch] Update DSV4 TRT fused MHC image#1270
Oseltamivir wants to merge 3 commits intomainfrom
dsv4-trt-mhc7168-image

Oseltamivir commented May 3, 2026 •

edited by functionstackx

Loading

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

claude Bot May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented May 3, 2026 • edited by functionstackx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

blockers on review/merge NVIDIA/TensorRT-LLM#13710

Summary

TensorRT-LLM fork

Validation

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

claude Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented May 3, 2026 •

edited by functionstackx

Loading