Skip to content

Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1298

Draft
anish-shanbhag wants to merge 1 commit intoSemiAnalysisAI:mainfrom
anish-shanbhag:ashan/port-inferencemax-53-minimax-h200-no-slurm-shared
Draft

Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1298
anish-shanbhag wants to merge 1 commit intoSemiAnalysisAI:mainfrom
anish-shanbhag:ashan/port-inferencemax-53-minimax-h200-no-slurm-shared

Conversation

@anish-shanbhag
Copy link
Copy Markdown

@anish-shanbhag anish-shanbhag commented May 7, 2026

Update MiniMax-M2.5 FP8 H200 vLLM to vllm/vllm-openai:v0.20.1-ubuntu2404

Set vLLM serving knobs in benchmarks/single_node/minimaxm2.5_fp8_h200.sh: generated benchmark max-model-len, previous eval max-model-len handling, fp8 KV cache, FlashInfer attention/autotune, Triton MoE, and MiniMax QK norm fusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant