Skip to content

Strip Gemma 4 thinking tokens from translation output#29

Merged
pierotofy merged 2 commits into
LibreTranslate:mainfrom
PolynomialDivision:strip-gemma4
Jun 18, 2026
Merged

Strip Gemma 4 thinking tokens from translation output#29
pierotofy merged 2 commits into
LibreTranslate:mainfrom
PolynomialDivision:strip-gemma4

Conversation

@PolynomialDivision

Copy link
Copy Markdown
Contributor

I am sorry, the "Big" PR was not supposed to be merged anymore, since I thought splitting it up into several smaller ones would be good for review. Now it got a bit scrambled, and some PRs I squashed differently into the other ones, so the code is now in a weird state. I am going through everything and checking again. Sorry for that. So this PR is still needed since Gemma 4 answers contain this "thought".

@PolynomialDivision

Copy link
Copy Markdown
Contributor Author

For the gemma4 model we need update-llama-cpp-2026-06-17 from llama-cpp-rs. But I guess it is only a matter of days when it is merged. :D It contains an important fix for the chat template. Sorry again, that I screwed up the PRs.

@PolynomialDivision PolynomialDivision marked this pull request as draft June 18, 2026 18:02
Gemma 4 emits thinking content in two forms:
- <|channel>thought\n...<channel|>answer (full block with closing tag)
- <|channel>thought answer (no closing tag, space-separated)

Handle both cases so thinking tokens never leak into the translation result.
Emit a warning when apply_chat_template fails and ltengine falls
back to the hardcoded Gemma prompt format.
@PolynomialDivision PolynomialDivision marked this pull request as ready for review June 18, 2026 18:25
@pierotofy

Copy link
Copy Markdown
Member

This looks OK, although it's a bit of a hack, we can include this, but long term it might be better to explicitly disable thinking mode from certain models, based on https://ai.google.dev/gemma/docs/capabilities/thinking it should be possible.

@pierotofy pierotofy merged commit 594a1f7 into LibreTranslate:main Jun 18, 2026
1 check passed
@PolynomialDivision

Copy link
Copy Markdown
Contributor Author

This looks OK, although it's a bit of a hack, we can include this, but long term it might be better to explicitly disable thinking mode from certain models, based on https://ai.google.dev/gemma/docs/capabilities/thinking it should be possible.

I will look into this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants