Qwen3.5-27B-UD-Q4_K_XL.gguf is overthinking like crazy
Qwen3.5-27B-UD-Q4_K_XL.gguf on latest version of LMstudio 0.4.5 build 2 with 32g vram is thinking itself into prompt failures when parameters are set to the unsloth coder thinking recommendations. This was to a simple, "Hi" for a prompt. If you say, "Hello, friend" it'll just crash out thinking until the context is full.
Update: Noticed if I leave all parameters to default instead of applying the unsloth ones, it thinks less. Still a bit, but functional.
In llama.cpp --reasoning-budget 0seems to work.
Both 35B-A3B and this one overthinks like crazy. I asked the carwash question and it got right both of them but it thought so much that it thought the same things 10 times before being sure. Like:" is it a riddle, Im not sure. It could be a riddle. Wait, but it says..." this. It thinks too much and I dont know how to disable its thinking tbh. Even with don`t think it gives me: "Self-Correction on "Don't think": I cannot actually stop thinking. I must interpret this as "Do not output your reasoning/thought process". Just give the answer." but still thinks like crazy
i have not personally tested it, but this may be worth trying:
https://www.reddit.com/r/LocalLLaMA/s/qdWvd4EZoL
thanks for the suggestions all. For my use case, someone on reddit mentioned the model behaves/thinks way more efficiently when harnessed for coding IE: roo code/kilocode/qwen code companion in vs code. I confirmed and is totally fine for my use case (this is with the bartowski version of similar quant though, not 100% sure but should be similar for the unsloth).