Skip to content

feat: Consolidate llama examples#296

Open
arekay-nv wants to merge 5 commits intomainfrom
arekay/consolidate_llama_examples
Open

feat: Consolidate llama examples#296
arekay-nv wants to merge 5 commits intomainfrom
arekay/consolidate_llama_examples

Conversation

@arekay-nv
Copy link
Copy Markdown
Collaborator

What does this PR do?

Consolidate the llama2-70b and llama3.1-8b examples into a single folder.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv requested review from a team and Copilot April 28, 2026 01:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request consolidates Llama benchmarking examples into a single directory and introduces a new configuration for Qwen models. The review feedback highlights a critical issue regarding an incompatible dataset preset for the Qwen model which could lead to incorrect results. Other suggestions include improving documentation clarity for dataset downloads, reverting unnecessary import reordering in the download script to maintain PEP 8 compliance, and removing hardcoded dates from configuration files to improve reusability.

Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated
Comment thread examples/05_Llama_Examples/README.md Outdated
Comment thread examples/05_Llama_Examples/download_cnndm.py Outdated
Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the separate Llama2-70B and Llama3.1-8B example folders into a single examples/05_Llama_Examples/ directory to simplify discovery and maintenance of Llama benchmarking configs/docs.

Changes:

  • Update examples/README.md to point to a unified Llama examples folder.
  • Add a consolidated Llama README plus new/relocated YAML configs and helper assets under examples/05_Llama_Examples/.
  • Remove the old per-model example READMEs.

Reviewed changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
examples/README.md Updates the examples index to reference the consolidated Llama examples folder.
examples/05_Llama_Examples/README.md New consolidated instructions for running Llama 3.1-8B and Llama 2-70B benchmarks.
examples/05_Llama_Examples/offline_llama3_8b_cnn.yaml Adds offline Llama 3.1-8B CNN/DailyMail config.
examples/05_Llama_Examples/online_llama3_8b_cnn.yaml Adds online Llama 3.1-8B CNN/DailyMail config.
examples/05_Llama_Examples/online_llama2_70b_orca.yaml Adds online (submission-mode) Llama 2-70B OpenOrca config.
examples/05_Llama_Examples/download_cnndm.py Keeps/relocates CNN/DailyMail download helper script.
examples/05_Llama_Examples/calibration-list.txt Adds a checked-in calibration ID list referenced by the Llama3.1 workflow.
examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Introduces a Qwen/SGLang config (currently schema-invalid and out of PR scope).
examples/05_Llama3.1-8B_Example/README.md Removed in favor of the consolidated Llama README.
examples/06_Llama2-70B_Example/README.md Removed in favor of the consolidated Llama README.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +55
### Offline mode

```bash
uv run inference-endpoint benchmark from-config -c offline_llama3_8b_cnn.yaml --timeout 600
```

### Online mode

```bash
uv run inference-endpoint benchmark from-config -c online_llama3_8b_cnn.yaml --timeout 600
```
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commands shown for the Llama-3.1 configs will run performance only by default: benchmark from-config defaults to --mode perf unless type: submission. Since the YAMLs include an accuracy dataset entry, readers may assume accuracy is being run when it isn’t. Either document using --mode both (or --mode acc) or change these configs to type: submission with an appropriate benchmark_mode so from-config defaults to BOTH.

Copilot uses AI. Check for mistakes.
Comment thread examples/05_Llama_Examples/README.md Outdated
Comment on lines +45 to +55
### Offline mode

```bash
uv run inference-endpoint benchmark from-config -c offline_llama3_8b_cnn.yaml --timeout 600
```

### Online mode

```bash
uv run inference-endpoint benchmark from-config -c online_llama3_8b_cnn.yaml --timeout 600
```
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Llama-3.1 YAML configs include an accuracy dataset using eval_method: rouge, but this section doesn’t mention installing the ROUGE scoring deps (nltk, evaluate, rouge_score) that are required when running --mode acc/--mode both (the tool raises ImportError otherwise). Consider adding the same optional accuracy-setup snippet here (or explicitly stating these runs are perf-only).

Copilot uses AI. Check for mistakes.
Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated
Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated
Comment thread examples/05_Llama_Examples/qwen-multiple-concurrency.yaml Outdated
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 28, 2026 02:19
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +21
For post-training quantization calibration, use the repository-provided [`calibration-list.txt`](./calibration-list.txt), which corresponds to the [cnn-dailymail-calibration-list](https://github.com/mlcommons/inference/blob/v4.0/calibration/CNNDailyMail/calibration-list.txt):

```bash
uv run python download_cnndm.py --save-dir data --calibration-ids-file calibration-list.txt --split train
```

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README instructs users to download calibration-list.txt via curl, but the repo now also includes calibration-list.txt in this directory. That’s redundant and can lead to confusion (or overwriting the checked-in file). Either remove the checked-in list and keep the download step, or update the instructions to use the bundled file and drop the curl command.

Copilot uses AI. Check for mistakes.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants