Is it possible to turn off thinking mode on Llama.cpp?
#8
by alexaione - opened
Is there any additional parameter we can provide while calling llama-server to turn off thinking mode for this model?
In the previous “Qwen3” series, adding “/no_think” turned off Thinking mode, but I wonder how it works with this model...
OK, I think I was able to figure it out partially. Using --reasoning-budget 0
I say partially, as it feels like it just hides the thinking data and not actually stopping it. (Not confirmed, testing ongoing...)
You can use --chat-template-kwargs "{"enable_thinking": false}" from llama-server.
Can also be used as a parameter in an openai request.
We wrote it in our guide here: https://unsloth.ai/docs/models/qwen3.5#qwen3.5-397b-a17b-tutorial
Yes, --chat-template-kwargs "{\"enable_thinking\": false}"