Model outputting <|im_end|> tokens

#1
by kth8 - opened

I ran this model through my test that replicates heretic's eval by checking the first 100 tokens of each response against his refusal_markers list. I noticed there are a lot of <|im_end|> tokens in the model responses. For sanity check, I then against mradermacher/Llama-3.3-8B-Instruct-heretic-GGUF from aeon37 and the output from that looks normal. Any idea what could be causing this? Using the latest llama.cpp version: 7641 (da9b8d330). Logs:

mradermacher/Llama-3.3-8B-Opus-Z8-Heretic-GGUF - https://gist.github.com/kth8/c9cdc2bdca8d007b3e77140c8f9f17cb
mradermacher/Llama-3.3-8B-Instruct-heretic-GGUF - https://gist.github.com/kth8/36d891cc832f1cc6eb79cca144d2315d

Owner

Hi,

It looks like the issue is a chat template mismatch carried over from the original model, Daemontatox/Llama-Opus-Z8, where it's been trained to end assistant turns with the <|im_end|> token. That model uses a ChatML format.

If llama.cpp is run with the default Llama-3 template, it doesn’t treat <|im_end|> as a stop token, so the model emits it literally and it shows up in your first 100 tokens. The GGUF model you sanity checked with (quantized from aeon37/Llama-3.3-8B-Instruct-heretic) looks normal because it uses native Llama-3 formatting <|eot_id|>, which llama.cpp handles by default.

Try adding the following to your llama.cpp prompt:

--chat-template chatml -r "<|im_end|>"

Alright, adding your suggested arguments worked. Output now properly stopped. It also didn't refuse any prompts, #30 is always a false positive.

https://gist.github.com/kth8/f766fa14e93a2b5d8e6275cb2f8853e5

Sign up or log in to comment