Model gets confused?
Mid generation it randomly says "<|im_start|>user" and writes back the prompt or just writes back the prompt, this is on oobabooga's webui with exllamav3-0.0.25, tried updating to exllamav3-0.0.26 but got the same result, turboderp/Qwen3.5-35B-A3B-exl3:4.09bpw works fine (I've also been using Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf, Qwen3-30B-A3B-Instruct-2507-UD-Q5_K_XL.gguf, Qwen3.5-27B-heretic-v2-IQ4_XS.gguf, Qwen3.5-27B-heretic-v2.i1-Q6_K.gguf, Qwen3.5-35B-A3B-heretic-v2.i1-Q4_K_M.gguf, shisa-v2.1-unphi4-14b_Q8_0.gguf and they all work fine)
I've only tested 5bpw quants, both hb6 and hb8. Can't say it was a thorough testing, but I didn't encounter such a problem after doing multiple responses and swipes on existing ~60k context chat. I've had some Chinese symbols leaking in, though, while Instruct didn't have that. Sorry, I have no idea what might cause your problem. Sounds like a wrong/broken template, but if you use chat completion, then that shouldn't be an issue.
If it is still relevant, perhaps you could get help in official Exllama Discord server: https://discord.gg/NSFwVuCjRq
Yeah I talked with turboderp and it was a bug in the exllamav3 kernel, will be fixed in 0.0.27, both presence and frequency penalty caused the problem.