Is it possible to turn off thinking mode on Llama.cpp?

#8
by alexaione - opened

Is there any additional parameter we can provide while calling llama-server to turn off thinking mode for this model?

In the previous “Qwen3” series, adding “/no_think” turned off Thinking mode, but I wonder how it works with this model...

OK, I think I was able to figure it out partially. Using --reasoning-budget 0

I say partially, as it feels like it just hides the thinking data and not actually stopping it. (Not confirmed, testing ongoing...)

You can use --chat-template-kwargs "{"enable_thinking": false}" from llama-server.
Can also be used as a parameter in an openai request.

Unsloth AI org

We wrote it in our guide here: https://unsloth.ai/docs/models/qwen3.5#qwen3.5-397b-a17b-tutorial

Yes, --chat-template-kwargs "{\"enable_thinking\": false}"

Sign up or log in to comment