Skip to content

[AMD] improve dsr1 fp4 disagg perf on mi355x#1236

Open
billishyahao wants to merge 72 commits intomainfrom
amd/mi355x-dsfp4-april14
Open

[AMD] improve dsr1 fp4 disagg perf on mi355x#1236
billishyahao wants to merge 72 commits intomainfrom
amd/mi355x-dsfp4-april14

Conversation

@billishyahao
Copy link
Copy Markdown
Collaborator

@billishyahao billishyahao commented Apr 30, 2026

replacement of #983

The new patch is adding the following optimization:

- "Bump SGL mori image to lmsysorg/sglang-rocm"
- "Add more high tput / low latency sweep configs"
- "Enable v2 mxfp4 DSR1 0528 model"
- "Enable fp4 disp / fp8 combine feature on mori"
- "Enable Mori SDMA + two batch overlapping feature"

billishyahao and others added 30 commits March 16, 2026 08:36
…transformers v5

Transformers v5 incorrectly rebuilds pre_tokenizer/decoder components for
models like DeepSeek-R1 that use LlamaTokenizerFast with a non-Llama
tokenizer architecture. The sglang server fixes this at startup, but the
benchmark client loads the tokenizer without these fixes, causing a ~5x
token count inflation (e.g. 7000 tokens -> 35000 tokens) and false
performance regressions in TTFT and throughput benchmarks.

Apply the same tokenizer fixes (pre_tokenizer/decoder restoration and
add_bos_token recovery) that sglang server applies, so client and server
tokenize identically. No-op on transformers v4.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@billishyahao
Copy link
Copy Markdown
Collaborator Author

Can we got the review for this patch ? @functionstackx @Oseltamivir @cquil11

Sweep 19 of 20 passed, 1 is canceled by user

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25241387090

Eval all passed

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25268431600/

Comment thread benchmarks/multi_node/amd_utils/server.sh Outdated
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an comment related to ur current code of "if evals: set xyz"


unset MORI_MOE_MAX_INPUT_TOKENS_PREFILL
unset MORI_MOE_MAX_INPUT_TOKENS_DECODE
unset SGLANG_MORI_FP8_COMB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billishyahao same thing here

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

DECODE_SERVER_CONFIG=$(echo "$DECODE_SERVER_CONFIG" | sed 's/--ep-dispatch-algorithm fake//g')
unset MORI_MOE_MAX_INPUT_TOKENS_PREFILL
unset MORI_MOE_MAX_INPUT_TOKENS_DECODE
unset SGLANG_MORI_FP8_COMB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billishyahao I don't understand why we are unsetting fp8 combine for evals only but using can we not performance benchmark.

It seems like the only thing we should change for evals specific is context len to fit the shots and not setting fp8 combine.

can you work with @Oseltamivir to figure it out? happy to dedicate time on our end to work with you on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants