File size: 10,648 Bytes
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58eb7af
c5c9387
f65806c
1ed82f4
 
 
 
 
 
 
f65806c
 
1ed82f4
 
 
d06172c
c5c9387
1ed82f4
c5c9387
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
81ec3f0
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262eff7
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01225b6
 
 
 
 
 
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f65806c
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f65806c
1ed82f4
c5c9387
 
 
 
d06172c
 
 
 
 
 
 
 
 
 
 
1ed82f4
 
 
 
 
 
 
f65806c
 
1ed82f4
 
 
 
 
d06172c
c5c9387
1ed82f4
 
 
 
 
 
 
 
f65806c
 
1ed82f4
 
 
 
 
c5c9387
d06172c
1ed82f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
license: apache-2.0
tags:
  - jinja
  - chat-template
  - qwen
  - qwen3.5
  - qwen3.6
  - lm-studio
  - mlx
  - llama.cpp
  - tool-calling
  - thinking
---

# Fixed Chat Templates for Qwen 3.5 & 3.6

> **2026-05-05** β€” Reviewed against community merged templates (allanchan339, fakezeta). Confirmed all useful features already present; `from_json` string-arg parsing not portable to C++ engines. Added auto-close unclosed thinking, thanks to allanchan339.

Drop-in Jinja templates that fix rendering errors, token waste, and missing features in the official Qwen chat templates. Works in LM Studio, llama.cpp, vLLM, MLX, oMLX, and any engine that supports HuggingFace Jinja templates.

## Why you need this

The official Qwen templates have bugs that break real usage:

| Problem | Impact |
|---------|--------|
| Tool calls fail on C++ engines | `|items` filter doesn't exist in LM Studio, llama.cpp, MLX, oMLX β€” tool calls produce a template error |
| `developer` role rejected | Modern APIs send it; the official template raises an error |
| Empty thinking blocks spam context | Every past turn gets wrapped in tags, even with nothing inside |
| No way to toggle thinking | You're stuck with whatever the model defaults to |
| Qwen 3.6: `</thinking>` hallucination | Model sometimes generates the wrong closing tag; parser fails |
| No-user-query exception breaks tool calling | `raise_exception` crashes agentic loops and resets in OpenClaw and similar runtimes |
| Unclosed thinking before tool call | Model starts reasoning then calls a tool without closing thinking block β€” malformed output |

All seven are fixed here, plus a clean `<|think_on|>` / `<|think_off|>` toggle you can drop into any message.

## Quick install

### LM Studio

1. Open your Qwen model in the right-side panel
2. Scroll to **Prompt Template**
3. Replace the template with the contents of `qwen3.5/chat_template.jinja` or `qwen3.6/chat_template.jinja`
4. Save

### llama.cpp / koboldcpp

```bash
--jinja --chat-template-file qwen3.6/chat_template.jinja
```

### vLLM / TextGen

Replace the `chat_template` string in your `tokenizer_config.json` with the file contents.

### oMLX

Overwrite `chat_template.jinja` in your local model directory. Load with `--jinja`. Remove any `chat_template_kwargs` overrides β€” the template handles everything internally.

## Which file do I use?

| File | For models |
|------|-----------|
| [`qwen3.5/chat_template.jinja`](qwen3.5/chat_template.jinja) | Qwen3.5-35B-A3B, Qwen3.5-32B, Qwen3.5-14B, and all Qwen 3.5 variants |
| [`qwen3.6/chat_template.jinja`](qwen3.6/chat_template.jinja) | Qwen3.6-27B, Qwen3.6-35B-A3B, and all Qwen 3.6 variants |

The 3.6 template is a superset β€” it additionally handles `preserve_thinking`, `</thinking>` hallucination recovery, and interrupted thought streams. If you're on 3.6, use the 3.6 file.

## Thinking toggle

Drop `<|think_on|>` or `<|think_off|>` anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.

**Fast answer, no reasoning:**
```
System: You are a coding assistant. <|think_off|>
User: What's 2+2?
```

**Deep reasoning:**
```
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
```

The tag syntax (`<|think_on|>`, `<|think_off|>`) uses Qwen's control-token delimiters, so it will never collide with real text. Earlier community templates used `/think`, which broke legitimate paths like `cd /mnt/project/think`.

## Pre-installed models

These templates are already bundled with:

- [froggeric/Qwen3.6-27B-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-27B-MLX-8bit)
- [froggeric/Qwen3.6-27B-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-27B-MLX-4bit)
- [froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit](https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit)
- [froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit](https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit)
- [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit)
- [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit)
- [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit)
- [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit)
- [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit)
- [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit)

If you're using one of those, you already have the template. This repo is for everyone else.

---

<details>
<summary>Technical details β€” what exactly was fixed</summary>

## Tool calls on C++ engines

The official template iterates tool call arguments with `|items`:

```jinja
{%- for key, value in tool_call.arguments|items %}
```

Python's Jinja supports `|items`. C++ runtimes (LM Studio, llama.cpp, MLX) do not β€” the template produces a rendering error instead of output. This template uses direct dictionary key lookups instead:

```jinja
{%- for args_name in tool_call.arguments %}
    {%- set args_value = tool_call.arguments[args_name] %}
```

It also replaces `is sequence` with `is iterable` (stricter C++ runtimes require it), removes `|safe` wrappers (also Python-only), and handles arguments returned as raw strings instead of objects.

## `developer` role

The OpenAI-compatible API spec sends `message.role == "developer"` for system-level instructions. The official Qwen template only checks for `"system"` and throws on anything else. Both templates here accept `"developer"` and map it to the system role.

## Empty thinking blocks

The official template wraps every past assistant turn in thinking tags:

```
<|im_start|>assistant
<think/>
</think >

Here is the answer...
```

When there's no reasoning content, those tags are dead weight β€” they waste context tokens and break prefix caching. The Qwen 3.5 template checks `reasoning_content` before emitting. The Qwen 3.6 template goes further: it respects the `preserve_thinking` kwarg, checks `reasoning_content|trim|length > 0`, and ties history visibility to the `<|think_off|>` override.

## `</thinking>` hallucination (Qwen 3.6 only)

The Qwen 3.6 model sometimes generates `</thinking>` instead of the expected closing tag. The official parser splits on `</think >` only and fails. The 3.6 template detects which closing tag was actually used and splits on that:

```jinja
{%- if '</think >' in content %}
    {%- set think_end_token = '</think >' %}
{%- elif '</thinking>' in content %}
    {%- set think_end_token = '</thinking>' %}
```

It also handles interrupted generation (max tokens hit mid-thought) by rescuing incomplete streams instead of injecting broken tag pairs.

## Arguments serialization

The official template serializes argument values with `|tojson` unconditionally, which turns Python `True` into JSON `true` correctly but fails when the value is already a string. The fixed templates check the type first β€” strings pass through as-is, everything else goes through `|tojson`.

## Auto-close unclosed thinking before tool calls

The model sometimes starts a thinking block and then immediately calls a tool without emitting the closing tag. The official template doesn't handle this β€” the unclosed thinking tag bleeds into the tool call, producing malformed output. Both fixed templates detect this pattern and auto-inject the closing tag before the tool call boundary.

## No-user-query exception

The official template scans the message list in reverse to find the last "real" user query (skipping tool-result wrappers). If all user messages are tool results β€” or there are no user messages at all β€” it fires `raise_exception('No user query found in messages.')` and the template **hard-crashes**.

This breaks real usage:
- **Agentic tool-calling chains** where the conversation ends with tool results and no fresh user query
- **After `/reset` or `/new`** in runtimes like OpenClaw, where tool results from a prior session persist without a new user message
- **System-only contexts** with no user messages

The fix replaces the exception with a graceful fallback: `{%- set ns.last_query_index = messages|length - 1 %}`. The thinking display logic then degrades naturally β€” assistant turns with reasoning content still show thinking tags when `preserve_thinking` is enabled.

</details>

<details>
<summary>Comparison β€” Qwen 3.5 templates</summary>

| Feature | Official | LuffyTheFox | mod-ellary | Pneuny | **This** |
|---------|----------|-------------|------------|--------|----------|
| Tool arguments | Fails | Fixed | Missing | Fixed | **Fixed** |
| `\|safe` removed | Fails | Fixed | Missing | Fixed | **Fixed** |
| `developer` role | Missing | Missing | Missing | Missing | **Added** |
| Thinking toggle | None | None | `/think` (system only) | None | **`<\|think_off\|>` anywhere** |
| Empty think in history | Broken | Broken | Tags omitted | Broken | **Fixed** |
| Text safety | N/A | N/A | Breaks on `/think` in paths | N/A | **Safe** |
| Clean instructions | Yes | Yes | Yes | Injects "I cannot call a tool" | **Yes** |
| No-user-query crash | Crashes | Crashes | Crashes | Crashes | **Graceful fallback** |
| Auto-close thinking before tool | Not handled | Not handled | Not handled | Not handled | **Auto-injects close tag** |

</details>

<details>
<summary>Comparison β€” Qwen 3.6 template</summary>

| Feature | Official | **This** |
|---------|----------|----------|
| Tool arguments | Fails (`\|items`) | **Fixed** |
| `\|safe` removed | Fails | **Fixed** |
| `developer` role | Missing | **Added** |
| Thinking toggle | None | **`<\|think_off\|>` anywhere** |
| `preserve_thinking` | Spams empty blocks | **Dynamic length checks** |
| `</thinking>` hallucination | Fails | **Detected and handled** |
| Interrupted streams | Broken tags | **Rescued** |
| Auto-close thinking before tool | Not handled | **Auto-injects close tag** |
| No-user-query crash | Crashes | **Graceful fallback** |

</details>

---

## Authorship

| Role | Author |
|------|--------|
| Original models | Alibaba Cloud (Qwen team) |
| Template fixes | [froggeric](https://huggingface.co/froggeric) |

## License

Apache-2.0, inherited from Qwen.