Easy to fall into infinite loop
It seems that this quantized model is easier to fall into infinite loop.
Prompt: "请为我设计一个适合团队玩的趣味游戏,游戏目标是通过合作完成一项挑战。"
For this AWQ-4bit model: Falls into infinite loop during thinking, even if I set repetition penalty to 2.0. Engine is vllm latest, kv cache is fp8 quantized.
The same model on OpenRouter does not have this issue.
Thanks for letting me know. With linear attention layers of this model are quantized to int4, it is alreay prone to errors. I wouldn't recommend using FP8 kv cache for this model.
I used non-quantized kv cache and the result is same, still very easy to fall into infinite loop.
I switched to the Qwen/Qwen3.5-27B-FP8 official FP8 quantization and it didn't have this issue, even if with FP8 kv cache.
Tried several times more, and Qwen/Qwen3.5-27B-FP8 official FP8 quant also falls into infinite loop sometimes.
I'm beginning to speculate it is the model issue.
I would recommend the following model cyankiwi/Qwen3.5-27B-AWQ-BF16-INT4, as I leave linear attention parameters at BF16. Linear attention params are heavily prone to quantization errors and I always leave linear attention parameters at BF16 in my quantized models.
This model cyankiwi/Qwen3.5-27B-AWQ-4bit is my only model having linear attention layers at INT4.
Thanks, cyankiwi/Qwen3.5-27B-AWQ-BF16-INT4 is significantly better on the infinite loop issue!
Could you please check if cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit has similar issue -- it doesn't specifically cause infinite loop on this previous prompt, but in our local test run it has shown infinite loop on some of our internal helpdesk chatbot test cases. Similarly, the infinite loop also occurs in the reasoning process.
Thank you very much.
Confirmed it is not quantization issue. It is a temperature issue for this model - if temperature is too low (my case is 0.7) and thinking is enabled while the question itself is rather open (such as a brainstorming), it tends to fall into infinite loop, probably because token is never sampled.
I managed to reproduce the issue even if using the original BF16 model.

