Is it possible to turn off thinking mode on Llama.cpp?

by alexaione - opened Feb 19

Feb 19

Is there any additional parameter we can provide while calling llama-server to turn off thinking mode for this model?

Feb 19

In the previous “Qwen3” series, adding “/no_think” turned off Thinking mode, but I wonder how it works with this model...

Feb 19

OK, I think I was able to figure it out partially. Using --reasoning-budget 0

I say partially, as it feels like it just hides the thinking data and not actually stopping it. (Not confirmed, testing ongoing...)

Feb 19

You can use --chat-template-kwargs "{"enable_thinking": false}" from llama-server.
Can also be used as a parameter in an openai request.

Unsloth AI org Feb 19

Yes, --chat-template-kwargs "{\"enable_thinking\": false}"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment