DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking

Modified chat template in llama.cpp

by tis111 - opened Mar 15

•

The model shows very good results, but there is one thing I am fighting with while testing. I can get it running cleanly in llama-cli or llama-server because the opening token is already written in the template. It always starts to display the reasoning trace.

Think it is the modified chat template. Are there any known solutions to this?

EDIT: I already edited the chat template - got it displayed well. But it is not possible to switch off reasoning.

Anduin1357

Mar 15

•

edited Mar 15

Put {%- set enable_thinking = false %} at the top of the jinja template to logically disable the reasoning trace. That is the standard approach for all Qwen3.5 models.

{# Final Generation Prompt #}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '<think>\n\n</think>\n\n' }}
    {%- else %}
        {{- '<think>\n' }}
    {%- endif %}
{%- endif %}

You should see:

<think>

</think>
Response...

This is okay. This approach is what was officially provided by the Qwen team.

See line 150 at: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/chat_template.jinja

tis111

Mar 15

•

edited Mar 15

That worked ! Thanks!
That surprises me because

--chat-template-kwargs '{"enable_thinking":false}'

failed to switch off thinking in my environment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment