Verbose looping

#28
by islameissa - opened

I test Qwen3.5 35B A3B Q4 and Q5 quants in both UD and none UD variants with 2 versions of llama.cpp b8008 and a newer one b8192 and different kv cache (f16, bf16 and q8_0). All of them at some point goes into verbose looping. Inference speed is much better on the newer llama.cpp b8192
Tried also different samplers. Maybe raising the temp helps a bit but nothing really eliminate it completely.
I wonder if those who uses safetensors have the same problem.
It is a big issue honestly making the model unusable.
Any advice?
should I download the chat template and point to it instead of just using jinja or maybe use chatml?

I'm wondering the same thing. I might try Safetensors directly. People love Qwen3.5 but so far it has not delivered for me. Also, Q8 and Q6 aren't showing up as multimodal..

@teddyspagetti
I tried lots of samplers combinations. It seems that this work the best to prevent the looping and the hallucinations. Try it and let me know:
-ctk q8_0
-ctv q8_0
--samplers "penalties;dry;top_k;typ_p;top_p;min_p;xtc;temperature"
--presence-penalty 1.7
--dry-multiplier 0.9
--temp 0.6
--top-p 0.8
--min-p 0.02

the order is important

Sign up or log in to comment