Modified chat template in llama.cpp
The model shows very good results, but there is one thing I am fighting with while testing. I can get it running cleanly in llama-cli or llama-server because the opening token is already written in the template. It always starts to display the reasoning trace.
Think it is the modified chat template. Are there any known solutions to this?
EDIT: I already edited the chat template - got it displayed well. But it is not possible to switch off reasoning.
Put {%- set enable_thinking = false %} at the top of the jinja template to logically disable the reasoning trace. That is the standard approach for all Qwen3.5 models.
{# Final Generation Prompt #}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- else %}
{{- '<think>\n' }}
{%- endif %}
{%- endif %}
You should see:
<think>
</think>
Response...
This is okay. This approach is what was officially provided by the Qwen team.
See line 150 at: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/chat_template.jinja
That worked ! Thanks!
That surprises me because
--chat-template-kwargs '{"enable_thinking":false}'
failed to switch off thinking in my environment.