Qwen 3.6 template has unnecessary cache invalidations
This is when running with {"preserve_thinking": true}, I'm not sure yet if it also happens in other situations.
During agentic workflows with tool use, llama.cpp regularly has to go back to a much earlier cache checkpoint despite the context just growing.
Also as a side-note, sometimes the model is outputting tool calls as part of thinking blocks.
I'll add more information here once I understand the behaviour better 😊
The Qwen3.5 template was fixing both these issues for me compared to the official or unsloth chat templates.
Also as a side-note, sometimes the model is outputting tool calls as part of thinking blocks.
This seems to be caused by the quantization. Q8_0 doesn't have this, Q4_K_XL does with temperature 0.6 (as recommended) but not with 0.2.
So unrelated to the template.
Thank you for clarifying. Although I remember from Qwen 3.5 that it is indeed a possibility, and this is one of the things that breaks LM Studio. Which is why it is recommended to turn off thinking in those apps if you are planning to use tools.
I would be interested to see if harnesses like Claude Code or open code are able to properly manage those.
Personally though, I do not see anything wrong with tool calls during the thinking phase, and that is actually something I frequently see when using GLM in Claude Code. It makes for better researched reasoning.
The problem is if the model believes the tool call will actually be executed and waits for the results, which would never happen. It seems like Q5_K_XL also does not have this behaviour, FWIW.
Also the tool calls are in XML (Anthropic) format, not the expected JSON (OpenAI) format, which seems to violate the constraints of the chat template.
But like I said, this seems to be quantization / temperature related so not really a problem. The actual problem with the cache invalidation is something I'll continue checking.
When i'm using this template with codex
■ {"error":{"message":"System message must be at the beginning.","type":"BadRequestError","param":null,"code":400}}
I have just updated both templates with a new fix which autocloses open think block before tool calls. This should solve this problem: "sometimes the model is outputting tool calls as part of thinking blocks."
Indeed it does in my testing so far since you added that fix some hours ago. But this still seems to be only (?) a problem with lower quantizations. The model gets these things right a lot more reliably with Q6_K_XL instead of Q4_K_XL (as in: I never saw this happen with Q6_K_XL so far).
Thanks for all your work on providing fixed up templates 😊
Hey, thanks for these fixes - I thought I was going insane noticing a few of these issues. I will say using the most recent template to catch the unclosed blocks, I get the "Undefined" error when I'm using Qwen3.6-27b via LM-Studio as the runtime, and OpenCode as the harness (but it does not surface in LM-Studio chats):
"Error rendering prompt with jinja template: \"Cannot call something that is not a function: got UndefinedValue\".\n\nThis is usually an issue with the model's prompt template. If you are using a popular model, you can try to search the model under lmstudio-community, which will have fixed prompt templates. If you cannot find one, you are welcome to post this issue to our discord or issue tracker on GitHub. Alternatively, if you know how to write jinja templates, you can override the prompt template in My Models > model settings > Prompt Template."
I asked ChatGPT 5.5 and it suspects it's something about Python string helpers not being ubiquitous across runtimes. It recommended me to use this instead of your unclosed think block for LM-Studio specifically:
{#- Simpler compatibility-safe handling: do not try to splice content -#}
{%- if '<think>' in content and '</think>' not in content %}
{%- set content = content + '</think>' %}
{%- endif %}
ChatGPT 5.5 also flagged other "brittle" things related to null-checking and calling .strip() instead of |trim, but so far it seems fine without changing more as I acknowledge the template is used across multiple runtimes.
After a compaction and using the modified template, an OpenCode agent has been running for more than 20 minutes without failing due to a malformed think block containing a tool call 🎉. This is luxury compared to my experience thus far (run for 1-4 minutes before tripping up on a random tool call, then coin-tossing whether it repeats the mistake after I told it that it put a tool call in the think blocks). It still slips up on occasion, but this is more runtime I've gotten from it ever before.
Thank you for your effort.
When i'm using this template with codex
■ {"error":{"message":"System message must be at the beginning.","type":"BadRequestError","param":null,"code":400}}
This is not a bug in the fixed template. The raise_exception('System message must be at the beginning.') check also happens in the official Qwen template, same code, same position.
This is caused by oh-my-pi and Codex sending multiple developer/system messages mid-conversation instead of only at the beginning. The template only accepts the first one and reject all others.
Maybe I can work out a "fix" for it though, although it is basically a fix for accommodating bad behaviour, and incorporate it in the fixed template.
I have made 2 updates to the templates, addressing the issues in this conversation:
- reviewed the think tag closing before tool calls, with a simpler, more robust and hopefully 100% compatible version; the ChatGPT solution was a good start, but not quite there yet
- addressed the system message must be at beginning exception, allowing it to happen, for both system and developer messages, anywhere; with a few more edge cases handling (no images allowed in system messages, and no wasteful empty block when it only contains a think tag); again, focused on compatibility
I would appreciate if you could test and report before I make it official. The files and readme are post-fixed with "-v8"
To be honest, it still has trouble on occasion, but could be my quantization at that point. The only issue with the template is the occasional thinking tool_call, but it seems to be more of a model/quant quirk than a template issue.
ChatGPT has theories that seem sound to me -
- the template tells the model to always lead with
<think>each turn, but if it only wanted to generate<tool_call>, due to the template its turn would start as<think><tool_call>. An unacceptable 100% fix for this is to disable thinking. - any conversational context that demonstrates the poor
<think><tool_call></tool_call>behavior can further steer it to repeat the behavior - the tool call instructions don't give a proper example of a valid tool call in the context of
<think>tags, thus is not "teaching" the model how to generate them structured properly.
So I figure #2 and #3 are the leading causes. So I started a new session with the template I'm using now which gives an explicit example of <think>Explanation</think> <tool_call>... in its instructions. I gave it a prompt to analyze a codebase using a minimum of 50 tool calls - it did not fail a single one, delegations also seem to happen flawlessly. But yet, randomly I get a "thinking" tool_call during arbitrary glob/grep/read operations. I fully acknowledge the quantization I'm running also contributes to malformed outputs, so this template may work perfectly on Q4 or Q8.
Qwen3.6-27b Q3_K_P (do note that I included preserve_thinking at the top of this, seems it's a personal preference thing).
I would have uploaded a txt file but huggingface only allows images/videos to be attached.
Lightly modified v8 template
{%- set preserve_thinking = true %}
{%- set image_count = namespace(value=0) %}
{%- set video_count = namespace(value=0) %}
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
{%- if content is string %}
{{- content }}
{%- elif content is iterable and content is not mapping %}
{%- for item in content %}
{%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
{%- if is_system_content %}
{{- raise_exception('System message cannot contain images.') }}
{%- endif %}
{%- if do_vision_count %}
{%- set image_count.value = image_count.value + 1 %}
{%- endif %}
{%- if add_vision_id is defined and add_vision_id %}
{{- 'Picture ' ~ image_count.value ~ ': ' }}
{%- endif %}
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
{%- elif 'video' in item or item.type == 'video' %}
{%- if is_system_content %}
{{- raise_exception('System message cannot contain videos.') }}
{%- endif %}
{%- if do_vision_count %}
{%- set video_count.value = video_count.value + 1 %}
{%- endif %}
{%- if add_vision_id is defined and add_vision_id %}
{{- 'Video ' ~ video_count.value ~ ': ' }}
{%- endif %}
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
{%- elif 'text' in item %}
{{- item.text }}
{%- else %}
{{- raise_exception('Unexpected item type in content.') }}
{%- endif %}
{%- endfor %}
{%- elif content is none or content is undefined %}
{{- '' }}
{%- else %}
{{- raise_exception('Unexpected content type.') }}
{%- endif %}
{%- endmacro %}
{%- set ns_flags = namespace(enable_thinking=true) %}
{%- if enable_thinking is defined %}
{%- set ns_flags.enable_thinking = enable_thinking %}
{%- endif %}
{%- if not messages %}
{{- raise_exception('No messages provided.') }}
{%- endif %}
{%- if tools and tools is iterable and tools is not mapping %}
{{- '<|im_start|>system\n' }}
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>" }}
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<think>Brief explanation of tool call</think>\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
{%- if messages[0].role == 'system' or messages[0].role == 'developer' %}
{%- set content = render_content(messages[0].content, false, true)|trim %}
{%- if '<|think_off|>' in content %}
{%- set ns_flags.enable_thinking = false %}
{%- set content = content | replace('<|think_off|>', '') %}
{%- endif %}
{%- if '<|think_on|>' in content %}
{%- set ns_flags.enable_thinking = true %}
{%- set content = content | replace('<|think_on|>', '') %}
{%- endif %}
{%- set content = content | trim %}
{%- if content %}
{{- '\n\n' + content }}
{%- endif %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- else %}
{%- if messages[0].role == 'system' or messages[0].role == 'developer' %}
{%- set content = render_content(messages[0].content, false, true)|trim %}
{%- if '<|think_off|>' in content %}
{%- set ns_flags.enable_thinking = false %}
{%- set content = content | replace('<|think_off|>', '') %}
{%- endif %}
{%- if '<|think_on|>' in content %}
{%- set ns_flags.enable_thinking = true %}
{%- set content = content | replace('<|think_on|>', '') %}
{%- endif %}
{%- set content = content | trim %}
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" %}
{%- set content = render_content(message.content, false)|trim %}
{%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if ns.multi_step_tool %}
{%- set ns.last_query_index = messages|length - 1 %}
{%- endif %}
{%- for message in messages %}
{%- set is_system = (message.role == "system" or message.role == "developer") %}
{%- set content = render_content(message.content, true, is_system)|trim %}
{%- if '<|think_off|>' in content %}
{%- set ns_flags.enable_thinking = false %}
{%- set content = content | replace('<|think_off|>', '') %}
{%- endif %}
{%- if '<|think_on|>' in content %}
{%- set ns_flags.enable_thinking = true %}
{%- set content = content | replace('<|think_on|>', '') %}
{%- endif %}
{%- set content = content | trim %}
{%- if is_system %}
{%- if not loop.first and content %}
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
{%- endif %}
{%- elif message.role == "user" %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- set think_end_token = '' %}
{%- if '</think>' in content %}
{%- set think_end_token = '</think>' %}
{%- elif '</thinking>' in content %}
{%- set think_end_token = '</thinking>' %}
{%- elif '<think>' in content %}
{#- Auto-close unclosed think before tool_call (compatibility-safe: no rfind/slice) -#}
{%- set think_part = content.split('<think>')[-1] %}
{%- if '<tool_call>' in think_part %}
{%- set reasoning_content = think_part.split('<tool_call>')[0] %}
{%- set content = '<tool_call>' ~ think_part.split('<tool_call>')[1:] | join('<tool_call>') %}
{%- else %}
{%- set reasoning_content = think_part %}
{%- set content = '' %}
{%- endif %}
{#- Ensure reasoning_content doesn't have leading whitespace like newlines -#}
{%- if reasoning_content.startswith('\n') %}
{%- set reasoning_content = reasoning_content[1:] %}
{%- endif %}
{%- endif %}
{%- if think_end_token %}
{%- set reasoning_content = content.split(think_end_token)[0].split('<think>')[-1] %}
{%- set content = content.split(think_end_token)[-1] %}
{%- if reasoning_content.endswith('\n') %}
{%- set reasoning_content = reasoning_content[:-1] %}
{%- endif %}
{%- if reasoning_content.startswith('\n') %}
{%- set reasoning_content = reasoning_content[1:] %}
{%- endif %}
{%- if content.startswith('\n') %}
{%- set content = content[1:] %}
{%- endif %}
{%- endif %}
{%- endif %}
{%- set reasoning_content = reasoning_content|trim %}
{%- set show_think = false %}
{%- if loop.index0 > ns.last_query_index %}
{%- set show_think = true %}
{%- elif ns_flags.enable_thinking and (preserve_thinking is undefined or preserve_thinking is true) and reasoning_content|length > 0 %}
{%- set show_think = true %}
{%- endif %}
{%- if show_think %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
{%- for tool_call in message.tool_calls %}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{%- if loop.first %}
{%- if content|trim %}
{{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- else %}
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- endif %}
{%- else %}
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- endif %}
{%- if tool_call.arguments is defined and tool_call.arguments is mapping %}
{%- if tool_call.arguments|length > 0 %}
{%- for args_name in tool_call.arguments %}
{%- set args_value = tool_call.arguments[args_name] %}
{{- '<parameter=' + args_name + '>\n' }}
{%- set args_value = args_value | string if args_value is string else args_value | tojson %}
{{- args_value }}
{{- '\n</parameter>\n' }}
{%- endfor %}
{%- endif %}
{%- elif tool_call.arguments is defined and tool_call.arguments is string %}
{%- if tool_call.arguments|trim|length > 0 %}
{{- tool_call.arguments }}
{{- '\n' }}
{%- endif %}
{%- endif %}
{{- '</function>\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.previtem and loop.previtem.role != "tool" %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if not loop.last and loop.nextitem.role != "tool" %}
{{- '<|im_end|>\n' }}
{%- elif loop.last %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- else %}
{{- raise_exception('Unexpected message role.') }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if ns_flags.enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- else %}
{{- '<think>\n' }}
{%- endif %}
{%- endif %}
@0x4tomic Thank you for your suggestion, this makes sense. I have analysed it carefully, and implemented in the v9 templates. Please test.
In parallel, I have been testing the v8 template in a long coding session with parallel subagents, reasoning, and tool use, and I have not found any problem (apart from a timeout, but that is different issue).