froggeric commited on
Commit
7efa9ee
·
verified ·
1 Parent(s): c6d4325

Upload README-v8.md

Browse files
Files changed (1) hide show
  1. README-v8.md +192 -0
README-v8.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - jinja
5
+ - chat-template
6
+ - qwen
7
+ - qwen3.5
8
+ - qwen3.6
9
+ - lm-studio
10
+ - mlx
11
+ - llama.cpp
12
+ - tool-calling
13
+ - thinking
14
+ ---
15
+
16
+ # Fixed jinja chat templates for Qwen 3.5 & 3.6
17
+
18
+ > **2026-05-07 Update:** Fixed 8th bug: Mid-conversation system messages no longer crash the template. Compatibility restored for agent frameworks (OpenCode, Docker Agent, oh-my-pi). Re-engineered Jinja string parsing for C++ engine stability.
19
+
20
+ These are drop-in Jinja templates that fix rendering errors, token waste, and missing features in the official Qwen chat templates.
21
+
22
+ They are tested to work across LM Studio, llama.cpp, vLLM, MLX, oMLX, and any engine that supports HuggingFace Jinja templates.
23
+
24
+ ---
25
+
26
+ ## Why you need this
27
+ The official Qwen templates contain restrictions and Python-specific Jinja logic that break usage on many inference engines and agent frameworks.
28
+
29
+ Here are the 8 bugs this template fixes:
30
+
31
+ | Problem | Impact | Fix |
32
+ |---|---|---|
33
+ | **1. Tool calls fail on C++ engines** | The `\|items` filter doesn't exist in `minijinja` (LM Studio, llama.cpp, MLX). Tool calls instantly crash the template. | Rewritten for strict C++ engine compatibility. |
34
+ | **2. Mid-conversation system crash** | Frameworks injecting mid-conversation steering instructions trigger a hard crash. | Native, chronological rendering for system messages anywhere. |
35
+ | **3. `developer` role rejected** | Modern APIs send the developer role; the official template rejects it. | Added full support for `"developer"`. |
36
+ | **4. Empty thinking blocks spam** | Every past turn gets wrapped in empty `<think></think>` tags, wasting context and breaking caching. | Dynamic length checks and history visibility logic. |
37
+ | **5. No way to toggle thinking** | The user is restricted to the model defaults. | Intercepts `<\|think_off\|>` and `<\|think_on\|>` tags natively. |
38
+ | **6. Qwen 3.6 `</thinking>` hallucination** | Model sometimes generates `</thinking>` instead of `</think>`, permanently breaking the parser. | Advanced tag detection and stream recovery. |
39
+ | **7. No-user-query crash** | `raise_exception` crashes agentic loops, system-only contexts, or `/reset` flows. | Graceful fallback scanning mechanism. |
40
+ | **8. Unclosed thinking before tool call** | Model calls a tool without closing its reasoning, bleeding XML tags into tool parsers. | Auto-injects closing tags before tool boundaries. |
41
+
42
+ ---
43
+
44
+ ## Quick install
45
+
46
+ Choose your environment and update the template:
47
+
48
+ ### LM Studio
49
+ 1. Open your Qwen model in the right-side panel.
50
+ 2. Scroll down to **Prompt Template**.
51
+ 3. Replace the template with the contents of `qwen3.5/chat_template.jinja` or `qwen3.6/chat_template.jinja`.
52
+ 4. Click **Save**.
53
+
54
+ ### llama.cpp / koboldcpp
55
+ ```bash
56
+ --jinja --chat-template-file qwen3.6/chat_template.jinja
57
+ ```
58
+
59
+ ### vLLM / TextGen
60
+ Replace the `"chat_template"` string in your `tokenizer_config.json` with the raw file contents.
61
+
62
+ ### oMLX
63
+ Overwrite `chat_template.jinja` in your local model directory. Load with `--jinja`. Remove any `chat_template_kwargs` overrides because the template handles everything internally.
64
+
65
+ ---
66
+
67
+ ## Which file do I use?
68
+
69
+ | Template File | Supported Models |
70
+ |------|-----------|
71
+ | [`qwen3.5/chat_template.jinja`](qwen3.5/chat_template.jinja) | Qwen3.5-35B-A3B, Qwen3.5-32B, Qwen3.5-14B, and all Qwen 3.5 variants. |
72
+ | [`qwen3.6/chat_template.jinja`](qwen3.6/chat_template.jinja) | Qwen3.6-27B, Qwen3.6-35B-A3B, and all Qwen 3.6 variants. |
73
+
74
+ > **Note:** The 3.6 template is a superset. It additionally handles `preserve_thinking`, `</thinking>` hallucination recovery, and interrupted thought streams. If you are on 3.6, always use the 3.6 file.
75
+
76
+ ---
77
+
78
+ ## The thinking toggle
79
+ You can control the model reasoning behavior. Insert `<|think_on|>` or `<|think_off|>` anywhere in your system or user prompt.
80
+
81
+ The template natively intercepts the tag, removes it from the final context so the model never sees it, and flips the reasoning mode instantly.
82
+
83
+ **Fast answer, no reasoning:**
84
+ ```text
85
+ System: You are a coding assistant. <|think_off|>
86
+ User: What's 2+2?
87
+ ```
88
+
89
+ **Deep reasoning:**
90
+ ```text
91
+ System: You are a coding assistant. <|think_on|>
92
+ User: Implement a red-black tree in Rust.
93
+ ```
94
+ *(The tag syntax uses Qwen's control-token delimiters to guarantee it will never collide with legitimate text or file paths, unlike earlier community templates that used `/think`)*
95
+
96
+ ---
97
+
98
+ ## Pre-installed models
99
+
100
+ If you are using one of the following models, you already have an older version of this template installed.
101
+
102
+ - [froggeric/Qwen3.6-27B-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-27B-MLX-8bit)
103
+ - [froggeric/Qwen3.6-27B-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-27B-MLX-4bit)
104
+ - [froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit](https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit)
105
+ - [froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit](https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit)
106
+ - [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit)
107
+ - [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit)
108
+ - [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit)
109
+ - [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit)
110
+ - [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit)
111
+ - [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit)
112
+
113
+ ---
114
+
115
+ <details>
116
+ <summary>Technical Details of the 8 Fixes</summary>
117
+
118
+ ### 1. Tool calls on C++ engines
119
+ The official template iterates tool call arguments with `|items`:
120
+ `{%- for key, value in tool_call.arguments|items %}`
121
+
122
+ Python's Jinja supports `|items`. C++ runtimes (LM Studio, llama.cpp, MLX) do not, which produces a rendering error. This template uses direct dictionary key lookups instead. It also replaces `is sequence` with `is iterable`, removes Python-only `|safe` wrappers, and handles arguments returned as raw strings.
123
+
124
+ ### 2. Mid-conversation system messages crash
125
+ The official template hard-crashes if a `system` or `developer` message appears anywhere except the first position. This breaks agentic frameworks (Codex CLI, Docker Agent, oh-my-pi, OpenCode) that inject steering instructions mid-conversation. The fix natively renders these messages chronologically to preserve LLM recency bias while enforcing strict image-blocking checks.
126
+
127
+ ### 3. `developer` role
128
+ The OpenAI-compatible API spec sends `message.role == "developer"` for system-level instructions. The official Qwen template throws an exception. Both templates here accept `"developer"` and map it properly.
129
+
130
+ ### 4. Empty thinking blocks
131
+ The official template wraps every past assistant turn in thinking tags, even when empty. When there is no reasoning content, those tags waste context tokens and break prefix caching. The 3.5 template checks `reasoning_content` before emitting. The 3.6 template checks `reasoning_content|trim|length > 0` and ties history visibility to the `<|think_off|>` override.
132
+
133
+ ### 5. `</thinking>` hallucination (Qwen 3.6 only)
134
+ The Qwen 3.6 model sometimes generates `</thinking>` instead of the expected `</think>`. The official parser splits on `</think >` only and fails. The 3.6 template detects which closing tag was actually used and splits dynamically. It also handles interrupted generation by rescuing incomplete streams.
135
+
136
+ ### 6. Arguments serialization
137
+ The official template serializes argument values with `|tojson` unconditionally, failing when the value is already a string. The fixed templates check the type first. Strings pass through as-is, and everything else goes through `|tojson`.
138
+
139
+ ### 7. Auto-close unclosed thinking before tool calls
140
+ The model sometimes starts a thinking block and immediately calls a tool without emitting the closing tag. The official template lets the unclosed thinking tag bleed into the tool call. The fixed templates detect this pattern and safely auto-inject the closing tag using standard Jinja `split` operations to guarantee 100% C++ compatibility.
141
+
142
+ ### 8. No-user-query exception
143
+ The official template scans the message list in reverse. If all messages are tool results, or there are no user messages, it fires `raise_exception('No user query found...')` and hard-crashes. The fix replaces the exception with a graceful fallback `{%- set ns.last_query_index = messages|length - 1 %}`, enabling agentic tool-calling chains to function perfectly.
144
+ </details>
145
+
146
+ <details>
147
+ <summary>Comparison: Qwen 3.5 templates</summary>
148
+
149
+ | Feature | Official | LuffyTheFox | mod-ellary | Pneuny | **This** |
150
+ |---------|----------|-------------|------------|--------|----------|
151
+ | Tool arguments | Fails | Fixed | Missing | Fixed | **Fixed** |
152
+ | `\|safe` removed | Fails | Fixed | Missing | Fixed | **Fixed** |
153
+ | `developer` role | Missing | Missing | Missing | Missing | **Added** |
154
+ | Thinking toggle | None | None | `/think` (system only) | None | **`<\|think_off\|>` anywhere** |
155
+ | Empty think in history | Broken | Broken | Tags omitted | Broken | **Fixed** |
156
+ | Mid-conversation system | Crashes | Crashes | Crashes | Crashes | **Fixed** |
157
+ | Clean instructions | Yes | Yes | Yes | Injects text | **Yes** |
158
+ | No-user-query crash | Crashes | Crashes | Crashes | Crashes | **Graceful fallback** |
159
+ | Auto-close thinking | Not handled | Not handled | Not handled | Not handled | **Auto-injects close tag** |
160
+
161
+ </details>
162
+
163
+ <details>
164
+ <summary>Comparison: Qwen 3.6 template</summary>
165
+
166
+ | Feature | Official | **This** |
167
+ |---------|----------|----------|
168
+ | Tool arguments | Fails (`\|items`) | **Fixed** |
169
+ | `\|safe` removed | Fails | **Fixed** |
170
+ | `developer` role | Missing | **Added** |
171
+ | Thinking toggle | None | **`<\|think_off\|>` anywhere** |
172
+ | `preserve_thinking` | Spams empty blocks | **Dynamic length checks** |
173
+ | Mid-conversation system | Crashes | **Fixed** |
174
+ | `</thinking>` hallucination | Fails | **Detected and handled** |
175
+ | Interrupted streams | Broken tags | **Rescued** |
176
+ | Auto-close thinking before tool | Not handled | **Auto-injects close tag** |
177
+ | No-user-query crash | Crashes | **Graceful fallback** |
178
+
179
+ </details>
180
+
181
+ ---
182
+
183
+ ## Authorship
184
+
185
+ | Role | Author |
186
+ |------|--------|
187
+ | Original models | Alibaba Cloud (Qwen team) |
188
+ | Template fixes | [froggeric](https://huggingface.co/froggeric) |
189
+
190
+ ## License
191
+
192
+ Apache-2.0, inherited from Qwen.