Model gets confused?

#1
by ehrrh - opened
This comment has been hidden (marked as Resolved)

Mid generation it randomly says "<|im_start|>user" and writes back the prompt or just writes back the prompt, this is on oobabooga's webui with exllamav3-0.0.25, tried updating to exllamav3-0.0.26 but got the same result, turboderp/Qwen3.5-35B-A3B-exl3:4.09bpw works fine (I've also been using Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf, Qwen3-30B-A3B-Instruct-2507-UD-Q5_K_XL.gguf, Qwen3.5-27B-heretic-v2-IQ4_XS.gguf, Qwen3.5-27B-heretic-v2.i1-Q6_K.gguf, Qwen3.5-35B-A3B-heretic-v2.i1-Q4_K_M.gguf, shisa-v2.1-unphi4-14b_Q8_0.gguf and they all work fine)

I've only tested 5bpw quants, both hb6 and hb8. Can't say it was a thorough testing, but I didn't encounter such a problem after doing multiple responses and swipes on existing ~60k context chat. I've had some Chinese symbols leaking in, though, while Instruct didn't have that. Sorry, I have no idea what might cause your problem. Sounds like a wrong/broken template, but if you use chat completion, then that shouldn't be an issue.

If it is still relevant, perhaps you could get help in official Exllama Discord server: https://discord.gg/NSFwVuCjRq

Yeah I talked with turboderp and it was a bug in the exllamav3 kernel, will be fixed in 0.0.27, both presence and frequency penalty caused the problem.

ehrrh changed discussion status to closed

Sign up or log in to comment