Skip to content

Maximize GPU layer offload#27

Merged
pierotofy merged 1 commit into
LibreTranslate:mainfrom
PolynomialDivision:layercount-and-gpuoffload
Jun 18, 2026
Merged

Maximize GPU layer offload#27
pierotofy merged 1 commit into
LibreTranslate:mainfrom
PolynomialDivision:layercount-and-gpuoffload

Conversation

@PolynomialDivision

Copy link
Copy Markdown
Contributor

No description provided.

@PolynomialDivision PolynomialDivision force-pushed the layercount-and-gpuoffload branch from 846cda6 to 9890d0d Compare June 18, 2026 17:37
…d failure

When a large model (e.g. Gemma 4 26B-A4B at 18 GB) fails to load because
llama.cpp tries to allocate all layers on a small GPU (3.7 GB VRAM), the
load itself throws before the probe loop can shed layers. Reduce the
layer count until the loading is succesfull.
@PolynomialDivision PolynomialDivision force-pushed the layercount-and-gpuoffload branch from 9890d0d to cdc2ba2 Compare June 18, 2026 17:39
@PolynomialDivision PolynomialDivision marked this pull request as draft June 18, 2026 18:02
@PolynomialDivision PolynomialDivision marked this pull request as ready for review June 18, 2026 18:03
@pierotofy

Copy link
Copy Markdown
Member

Looks great to me, thanks @PolynomialDivision ! 👍

@pierotofy pierotofy merged commit 7a4d939 into LibreTranslate:main Jun 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants