feat: Consolidate llama examples by arekay-nv · Pull Request #296 · mlcommons/endpoints

arekay-nv · 2026-04-28T01:42:22Z

What does this PR do?

Consolidate the llama2-70b and llama3.1-8b examples into a single folder.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

github-actions · 2026-04-28T01:42:29Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request consolidates Llama benchmarking examples into a single directory and introduces a new configuration for Qwen models. The review feedback highlights a critical issue regarding an incompatible dataset preset for the Qwen model which could lead to incorrect results. Other suggestions include improving documentation clarity for dataset downloads, reverting unnecessary import reordering in the download script to maintain PEP 8 compliance, and removing hardcoded dates from configuration files to improve reusability.

Copilot

Pull request overview

This PR consolidates the separate Llama2-70B and Llama3.1-8B example folders into a single examples/05_Llama_Examples/ directory to simplify discovery and maintenance of Llama benchmarking configs/docs.

Changes:

Update examples/README.md to point to a unified Llama examples folder.
Add a consolidated Llama README plus new/relocated YAML configs and helper assets under examples/05_Llama_Examples/.
Remove the old per-model example READMEs.

Reviewed changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
examples/README.md	Updates the examples index to reference the consolidated Llama examples folder.
examples/05_Llama_Examples/README.md	New consolidated instructions for running Llama 3.1-8B and Llama 2-70B benchmarks.
examples/05_Llama_Examples/offline_llama3_8b_cnn.yaml	Adds offline Llama 3.1-8B CNN/DailyMail config.
examples/05_Llama_Examples/online_llama3_8b_cnn.yaml	Adds online Llama 3.1-8B CNN/DailyMail config.
examples/05_Llama_Examples/online_llama2_70b_orca.yaml	Adds online (submission-mode) Llama 2-70B OpenOrca config.
examples/05_Llama_Examples/download_cnndm.py	Keeps/relocates CNN/DailyMail download helper script.
examples/05_Llama_Examples/calibration-list.txt	Adds a checked-in calibration ID list referenced by the Llama3.1 workflow.
examples/05_Llama_Examples/qwen-multiple-concurrency.yaml	Introduces a Qwen/SGLang config (currently schema-invalid and out of PR scope).
examples/05_Llama3.1-8B_Example/README.md	Removed in favor of the consolidated Llama README.
examples/06_Llama2-70B_Example/README.md	Removed in favor of the consolidated Llama README.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T01:47:53Z

+### Offline mode
+
+```bash
+uv run inference-endpoint benchmark from-config -c offline_llama3_8b_cnn.yaml --timeout 600
+```
+
+### Online mode
+
+```bash
+uv run inference-endpoint benchmark from-config -c online_llama3_8b_cnn.yaml --timeout 600
+```


The commands shown for the Llama-3.1 configs will run performance only by default: benchmark from-config defaults to --mode perf unless type: submission. Since the YAMLs include an accuracy dataset entry, readers may assume accuracy is being run when it isn’t. Either document using --mode both (or --mode acc) or change these configs to type: submission with an appropriate benchmark_mode so from-config defaults to BOTH.

Copilot · 2026-04-28T01:47:54Z

+### Offline mode
+
+```bash
+uv run inference-endpoint benchmark from-config -c offline_llama3_8b_cnn.yaml --timeout 600
+```
+
+### Online mode
+
+```bash
+uv run inference-endpoint benchmark from-config -c online_llama3_8b_cnn.yaml --timeout 600
+```


The Llama-3.1 YAML configs include an accuracy dataset using eval_method: rouge, but this section doesn’t mention installing the ROUGE scoring deps (nltk, evaluate, rouge_score) that are required when running --mode acc/--mode both (the tool raises ImportError otherwise). Consider adding the same optional accuracy-setup snippet here (or explicitly stating these runs are perf-only).

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T02:24:54Z

+For post-training quantization calibration, use the repository-provided [`calibration-list.txt`](./calibration-list.txt), which corresponds to the [cnn-dailymail-calibration-list](https://github.com/mlcommons/inference/blob/v4.0/calibration/CNNDailyMail/calibration-list.txt):
+
+```bash
+uv run python download_cnndm.py --save-dir data --calibration-ids-file calibration-list.txt --split train
+```
+


This README instructs users to download calibration-list.txt via curl, but the repo now also includes calibration-list.txt in this directory. That’s redundant and can lead to confusion (or overwriting the checked-in file). Either remove the checked-in list and keep the download step, or update the instructions to use the bundled file and drop the curl command.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Consolidate llama examples

3aadf2f

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested review from a team and Copilot April 28, 2026 01:42

github-actions Bot requested a review from nvzhihanj April 28, 2026 01:42

Copilot started reviewing on behalf of arekay-nv April 28, 2026 01:42 View session

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated

Comment thread examples/05_Llama_Examples/README.md Outdated

Comment thread examples/05_Llama_Examples/download_cnndm.py Outdated

Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated

Copilot AI reviewed Apr 28, 2026

View reviewed changes

arekay-nv added 2 commits April 27, 2026 20:52

Merge branch 'main' into arekay/consolidate_llama_examples

02681fc

Sync lock

0d7353f

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 28, 2026 02:19

Copilot started reviewing on behalf of arekay-nv April 28, 2026 02:19 View session

Fixes.

790a08a

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Fixes.

9260d6f

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Consolidate llama examples#296

feat: Consolidate llama examples#296
arekay-nv wants to merge 5 commits intomainfrom
arekay/consolidate_llama_examples

arekay-nv commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arekay-nv commented Apr 28, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 28, 2026 •

edited

Loading