[MLX] Gemma4-31B ondevice sampling by kiymetakdemir · Pull Request #20561 · pytorch/executorch

kiymetakdemir · 2026-06-27T01:59:44Z

Summary

Lets the MLX-exported Gemma 4 31B model sample the next token on-device instead of returning logits for host-side sampling. Sampling is opt-in at export (--sample); temperature, top_p, and seed are runtime inputs, and the runner increments the seed per token.

Changes

export.py --sample flag wraps the model so forward(tokens, input_pos, temperature,
top_p, seed) → int64 token; records a use_sampling constant-method flag. Non-sample export unchanged.
gemma4_31b_engine.cpp reads use_sampling from metadata; when set, consumes the int64 token directly instead of logits_to_token, feeds the scalar inputs in prefill/decode (across the min/max prefill chunking), and manages the per-token seed schedule. top_k is still rejected; top_p is range-checked to (0, 1]; top_p/seed are rejected on non-sample models.
main.cpp --top_p / --seed flags wired into SamplingConfig; an unset seed is randomized
only for sampling models (non-sample keep seed 0, so they don't trip the guard).
tests/test_mlx_pipeline.py adds test_export_to_pte_with_sampling: tiny-model MLX export with --sample, asserting the use_sampling flag, int64 token output, and same-seed reproducibility.

pytorch-bot · 2026-06-27T01:59:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20561

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Pending, 2 Unclassified Failures

As of commit 373b79d with merge base 825bd30 ():

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Aarch64 Linux Wheels / pytorch/executorch / build-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/__w/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:722:48: error: ‘inmemoryfs::InMemoryFileSystem::InMemoryNode::Kind’ has not been declared
Build Aarch64 Linux Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_aarch64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-27T02:00:41Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Add on-device sampling to the Gemma 4 31B MLX runner

373b79d

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLX] Gemma4-31B ondevice sampling#20561

[MLX] Gemma4-31B ondevice sampling#20561
kiymetakdemir wants to merge 1 commit into
pytorch:mainfrom
kiymetakdemir:gemma-ondevice-sampling

kiymetakdemir commented Jun 27, 2026

Uh oh!

pytorch-bot Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kiymetakdemir commented Jun 27, 2026

Uh oh!

pytorch-bot Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20561

❌ 2 Pending, 2 Unclassified Failures

Uh oh!

github-actions Bot commented Jun 27, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented Jun 27, 2026 •

edited

Loading

This PR needs a `release notes:` label