Maximize GPU layer offload by PolynomialDivision · Pull Request #27 · LibreTranslate/LTEngine

PolynomialDivision · 2026-06-18T11:44:10Z

No description provided.

…d failure When a large model (e.g. Gemma 4 26B-A4B at 18 GB) fails to load because llama.cpp tries to allocate all layers on a small GPU (3.7 GB VRAM), the load itself throws before the probe loop can shed layers. Reduce the layer count until the loading is succesfull.

pierotofy · 2026-06-18T19:16:07Z

Looks great to me, thanks @PolynomialDivision ! 👍

PolynomialDivision force-pushed the layercount-and-gpuoffload branch from 846cda6 to 9890d0d Compare June 18, 2026 17:37

PolynomialDivision force-pushed the layercount-and-gpuoffload branch from 9890d0d to cdc2ba2 Compare June 18, 2026 17:39

PolynomialDivision marked this pull request as draft June 18, 2026 18:02

PolynomialDivision marked this pull request as ready for review June 18, 2026 18:03

pierotofy merged commit 7a4d939 into LibreTranslate:main Jun 18, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximize GPU layer offload#27

Maximize GPU layer offload#27
pierotofy merged 1 commit into
LibreTranslate:mainfrom
PolynomialDivision:layercount-and-gpuoffload

PolynomialDivision commented Jun 18, 2026

Uh oh!

pierotofy commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PolynomialDivision commented Jun 18, 2026

Uh oh!

pierotofy commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants