Image-Text-to-Text
Transformers
GGUF
text-generation-inference
unsloth
qwen3_5
reasoning
chain-of-thought
lora
sft
agent
tool-use
function-calling
coder
conversational

Reasoning loop in llamacpp

#2
by Grandys - opened

running with the llama server and stuck in a reasoning loop. never-ending reasoning. Are there any tips for inference settings? Now I am using this command

.\llamacpp\llama-server --model "D:\ai-tools\llm\Jackrong\Qwopus3.5-9B-Coder-GGUF\Qwopus3.5-9B-coder-Exp-Q5_K_M.gguf" --mmproj "D:\ai-tools\llm\Jackrong\Qwopus3.5-9B-Coder-GGUF\mmproj.gguf" --ctx-size 131072 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --port 8001 --reasoning on -fa on --fit on --no-mmap -ctk q8_0 -ctv q8_0 --no-warmup -np 1 --prio 2 --mlock --jinja

The reasoning loop happens when I do the car wash test. Before using kv at q8, I use the default kv cache, and the reasoning loop still happens.

Have you tried this? https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates

Not yet ill try it. Thank you! btw which chat template i need to download that suitable with this model? There is a lot of variety available.

Have you tried this? https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates

Not yet ill try it. Thank you! btw which chat template i need to download that suitable with this model? There is a lot of variety available.

https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/resolve/main/chat_template.jinja?download=true

https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/resolve/main/chat_template.jinja?download=true

already tried it. still loop. Even funnier, the think tag got leaked outside the reasoning block, and the loop happened in the chat/response block

Yep, it's loopy

Sign up or log in to comment