token masking and space_between_special_tokens during finetuning

#36
by Darkhn - opened

Good morning team Gemma,

After fine-tunning gemma-4-31B-it, i have a few questions, since the way the model reason, changed.

i make use extensively of asterisks in my dataset as paragraphs delimiter, it seems after training, the model will no longer reason at all without a prefill of asterisk on top of <|channel>thought\n* like so.

my questions are:

  1. does it requires the use space_between_special_tokens and so i should process my dataset like so?

<|channel>thought\n *

  1. since it says to add those control tokens to non reasoning datasets (which i did) must those be masked?

https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

image

the first two screenshots are processed, with reasoning sample, and the third one is a non-reasoning dataset

image

  1. i do see that EOS and EOT are different, must EOS be added to the end of every samples or were EOS used only during pre-training? (the chat template does not seem to)

Thanks for your time, and the model is just, one of the best, it does seem that hybrid attention sliding+full leads to great results, even Qwen3.5 27B is hitting above its weight, but Gemma 4 with it's world/media knowledge make's it punch way above its weight, i was still running llama 3.3 70B and mistral large 2 123B, since there was no real dense alternative, most model in the 24-32B range were not getting close.

Sign up or log in to comment