Upload README-v9.md
Browse files- README-v9.md +200 -0
README-v9.md
ADDED
|
@@ -0,0 +1,200 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- jinja
|
| 5 |
+
- chat-template
|
| 6 |
+
- qwen
|
| 7 |
+
- qwen3.5
|
| 8 |
+
- qwen3.6
|
| 9 |
+
- lm-studio
|
| 10 |
+
- mlx
|
| 11 |
+
- llama.cpp
|
| 12 |
+
- tool-calling
|
| 13 |
+
- thinking
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Fixed jinja chat templates for Qwen 3.5 & 3.6
|
| 17 |
+
|
| 18 |
+
> **2026-05-08 Update:** Fixed 9th bug: Thinking-tool-call hallucination. Refactored system prompt parsing to enable dynamic tool instructions. The template now actively teaches the model how to safely combine `<think>` blocks and `<tool_call>` boundaries.
|
| 19 |
+
>
|
| 20 |
+
> **2026-05-07 Update:** Fixed 8th bug: Mid-conversation system messages no longer crash the template. Compatibility restored for agent frameworks (OpenCode, Docker Agent, oh-my-pi). Re-engineered Jinja string parsing for C++ engine stability.
|
| 21 |
+
|
| 22 |
+
These are drop-in Jinja templates that fix rendering errors, token waste, and missing features in the official Qwen chat templates.
|
| 23 |
+
|
| 24 |
+
They are tested to work across LM Studio, llama.cpp, vLLM, MLX, oMLX, and any engine that supports HuggingFace Jinja templates.
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## Why you need this
|
| 29 |
+
The official Qwen templates contain restrictions and Python-specific Jinja logic that break usage on many inference engines and agent frameworks.
|
| 30 |
+
|
| 31 |
+
Here are the 9 bugs this template fixes:
|
| 32 |
+
|
| 33 |
+
| Problem | Impact | Fix |
|
| 34 |
+
|---|---|---|
|
| 35 |
+
| **1. Tool calls fail on C++ engines** | The `\|items` filter doesn't exist in `minijinja` (LM Studio, llama.cpp, MLX). Tool calls instantly crash the template. | Rewritten for strict C++ engine compatibility. |
|
| 36 |
+
| **2. Mid-conversation system crash** | Frameworks injecting mid-conversation steering instructions trigger a hard crash. | Native, chronological rendering for system messages anywhere. |
|
| 37 |
+
| **3. `developer` role rejected** | Modern APIs send the developer role; the official template rejects it. | Added full support for `"developer"`. |
|
| 38 |
+
| **4. Empty thinking blocks spam** | Every past turn gets wrapped in empty `<think></think>` tags, wasting context and breaking caching. | Dynamic length checks and history visibility logic. |
|
| 39 |
+
| **5. No way to toggle thinking** | The user is restricted to the model defaults. | Intercepts `<\|think_off\|>` and `<\|think_on\|>` tags natively. |
|
| 40 |
+
| **6. Qwen 3.6 `</thinking>` hallucination** | Model sometimes generates `</thinking>` instead of `</think>`, permanently breaking the parser. | Advanced tag detection and stream recovery. |
|
| 41 |
+
| **7. No-user-query crash** | `raise_exception` crashes agentic loops, system-only contexts, or `/reset` flows. | Graceful fallback scanning mechanism. |
|
| 42 |
+
| **8. Unclosed thinking before tool call** | Model calls a tool without closing its reasoning, bleeding XML tags into tool parsers. | Auto-injects closing tags before tool boundaries. |
|
| 43 |
+
| **9. Thinking tool_call hallucination** | Model places `<tool_call>` inside `<think>` block because prompt forces `<think>\n` before a strict tool instruction. | Hoists system toggle to inject `<think>` natively into tool instructions. |
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
## Quick install
|
| 48 |
+
|
| 49 |
+
Choose your environment and update the template:
|
| 50 |
+
|
| 51 |
+
### LM Studio
|
| 52 |
+
1. Open your Qwen model in the right-side panel.
|
| 53 |
+
2. Scroll down to **Prompt Template**.
|
| 54 |
+
3. Replace the template with the contents of `qwen3.5/chat_template.jinja` or `qwen3.6/chat_template.jinja`.
|
| 55 |
+
4. Click **Save**.
|
| 56 |
+
|
| 57 |
+
### llama.cpp / koboldcpp
|
| 58 |
+
```bash
|
| 59 |
+
--jinja --chat-template-file qwen3.6/chat_template.jinja
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
### vLLM / TextGen
|
| 63 |
+
Replace the `"chat_template"` string in your `tokenizer_config.json` with the raw file contents.
|
| 64 |
+
|
| 65 |
+
### oMLX
|
| 66 |
+
Overwrite `chat_template.jinja` in your local model directory. Load with `--jinja`. Remove any `chat_template_kwargs` overrides because the template handles everything internally.
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Which file do I use?
|
| 71 |
+
|
| 72 |
+
| Template File | Supported Models |
|
| 73 |
+
|------|-----------|
|
| 74 |
+
| [`qwen3.5/chat_template.jinja`](qwen3.5/chat_template.jinja) | Qwen3.5-35B-A3B, Qwen3.5-32B, Qwen3.5-14B, and all Qwen 3.5 variants. |
|
| 75 |
+
| [`qwen3.6/chat_template.jinja`](qwen3.6/chat_template.jinja) | Qwen3.6-27B, Qwen3.6-35B-A3B, and all Qwen 3.6 variants. |
|
| 76 |
+
|
| 77 |
+
> **Note:** The 3.6 template is a superset. It additionally handles `preserve_thinking`, `</thinking>` hallucination recovery, and interrupted thought streams. If you are on 3.6, always use the 3.6 file.
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## The thinking toggle
|
| 82 |
+
You can control the model reasoning behavior. Insert `<|think_on|>` or `<|think_off|>` anywhere in your system or user prompt.
|
| 83 |
+
|
| 84 |
+
The template natively intercepts the tag, removes it from the final context so the model never sees it, and flips the reasoning mode instantly.
|
| 85 |
+
|
| 86 |
+
**Fast answer, no reasoning:**
|
| 87 |
+
```text
|
| 88 |
+
System: You are a coding assistant. <|think_off|>
|
| 89 |
+
User: What's 2+2?
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
**Deep reasoning:**
|
| 93 |
+
```text
|
| 94 |
+
System: You are a coding assistant. <|think_on|>
|
| 95 |
+
User: Implement a red-black tree in Rust.
|
| 96 |
+
```
|
| 97 |
+
*(The tag syntax uses Qwen's control-token delimiters to guarantee it will never collide with legitimate text or file paths, unlike earlier community templates that used `/think`)*
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## Pre-installed models
|
| 102 |
+
|
| 103 |
+
If you are using one of the following models, you already have an older version of this template installed.
|
| 104 |
+
|
| 105 |
+
- [froggeric/Qwen3.6-27B-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-27B-MLX-8bit)
|
| 106 |
+
- [froggeric/Qwen3.6-27B-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-27B-MLX-4bit)
|
| 107 |
+
- [froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit](https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit)
|
| 108 |
+
- [froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit](https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit)
|
| 109 |
+
- [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-8bit)
|
| 110 |
+
- [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-6bit)
|
| 111 |
+
- [froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-27B-Uncensored-Heretic-v2-MLX-4bit)
|
| 112 |
+
- [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-8bit)
|
| 113 |
+
- [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-6bit)
|
| 114 |
+
- [froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit](https://huggingface.co/froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit)
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
<details>
|
| 119 |
+
<summary>Technical Details of the 9 Fixes</summary>
|
| 120 |
+
|
| 121 |
+
### 1. Tool calls on C++ engines
|
| 122 |
+
The official template iterates tool call arguments with `|items`:
|
| 123 |
+
`{%- for key, value in tool_call.arguments|items %}`
|
| 124 |
+
|
| 125 |
+
Python's Jinja supports `|items`. C++ runtimes (LM Studio, llama.cpp, MLX) do not, which produces a rendering error. This template uses direct dictionary key lookups instead. It also replaces `is sequence` with `is iterable`, removes Python-only `|safe` wrappers, and handles arguments returned as raw strings.
|
| 126 |
+
|
| 127 |
+
### 2. Mid-conversation system messages crash
|
| 128 |
+
The official template hard-crashes if a `system` or `developer` message appears anywhere except the first position. This breaks agentic frameworks (Codex CLI, Docker Agent, oh-my-pi, OpenCode) that inject steering instructions mid-conversation. The fix natively renders these messages chronologically to preserve LLM recency bias while enforcing strict image-blocking checks.
|
| 129 |
+
|
| 130 |
+
### 3. `developer` role
|
| 131 |
+
The OpenAI-compatible API spec sends `message.role == "developer"` for system-level instructions. The official Qwen template throws an exception. Both templates here accept `"developer"` and map it properly.
|
| 132 |
+
|
| 133 |
+
### 4. Empty thinking blocks
|
| 134 |
+
The official template wraps every past assistant turn in thinking tags, even when empty. When there is no reasoning content, those tags waste context tokens and break prefix caching. The 3.5 template checks `reasoning_content` before emitting. The 3.6 template checks `reasoning_content|trim|length > 0` and ties history visibility to the `<|think_off|>` override.
|
| 135 |
+
|
| 136 |
+
### 5. `</thinking>` hallucination (Qwen 3.6 only)
|
| 137 |
+
The Qwen 3.6 model sometimes generates `</thinking>` instead of the expected `</think>`. The official parser splits on `</think >` only and fails. The 3.6 template detects which closing tag was actually used and splits dynamically. It also handles interrupted generation by rescuing incomplete streams.
|
| 138 |
+
|
| 139 |
+
### 6. Arguments serialization
|
| 140 |
+
The official template serializes argument values with `|tojson` unconditionally, failing when the value is already a string. The fixed templates check the type first. Strings pass through as-is, and everything else goes through `|tojson`.
|
| 141 |
+
|
| 142 |
+
### 7. Auto-close unclosed thinking before tool calls
|
| 143 |
+
The model sometimes starts a thinking block and immediately calls a tool without emitting the closing tag. The official template lets the unclosed thinking tag bleed into the tool call. The fixed templates detect this pattern and safely auto-inject the closing tag using standard Jinja `split` operations to guarantee 100% C++ compatibility.
|
| 144 |
+
|
| 145 |
+
### 8. No-user-query exception
|
| 146 |
+
The official template scans the message list in reverse. If all messages are tool results, or there are no user messages, it fires `raise_exception('No user query found...')` and hard-crashes. The fix replaces the exception with a graceful fallback `{%- set ns.last_query_index = messages|length - 1 %}`, enabling agentic tool-calling chains to function perfectly.
|
| 147 |
+
|
| 148 |
+
### 9. Thinking tool_call hallucination
|
| 149 |
+
The official template appends `<think>\n` to the end of the generation prompt to initiate reasoning. However, its system instructions rigidly demand the model to output *only* `<tool_call>` with no suffix. This contradictory state causes the model to improperly nest its tool call inside the thinking block. This template utilizes a global pre-scan to evaluate the final `enable_thinking` state across the entire conversation history, guaranteeing it can dynamically inject a proper `<think>...</think>` usage example into the tool instructions exactly when reasoning is enabled.
|
| 150 |
+
</details>
|
| 151 |
+
|
| 152 |
+
<details>
|
| 153 |
+
<summary>Comparison: Qwen 3.5 templates</summary>
|
| 154 |
+
|
| 155 |
+
| Feature | Official | LuffyTheFox | mod-ellary | Pneuny | **This** |
|
| 156 |
+
|---------|----------|-------------|------------|--------|----------|
|
| 157 |
+
| Tool arguments | Fails | Fixed | Missing | Fixed | **Fixed** |
|
| 158 |
+
| `\|safe` removed | Fails | Fixed | Missing | Fixed | **Fixed** |
|
| 159 |
+
| `developer` role | Missing | Missing | Missing | Missing | **Added** |
|
| 160 |
+
| Thinking toggle | None | None | `/think` (system only) | None | **`<\|think_off\|>` anywhere** |
|
| 161 |
+
| Empty think in history | Broken | Broken | Tags omitted | Broken | **Fixed** |
|
| 162 |
+
| Mid-conversation system | Crashes | Crashes | Crashes | Crashes | **Fixed** |
|
| 163 |
+
| Clean instructions | Yes | Yes | Yes | Injects text | **Yes** |
|
| 164 |
+
| No-user-query crash | Crashes | Crashes | Crashes | Crashes | **Graceful fallback** |
|
| 165 |
+
| Auto-close thinking | Not handled | Not handled | Not handled | Not handled | **Auto-injects close tag** |
|
| 166 |
+
| Dynamic tool format | Static | Static | Static | Static | **Yes** |
|
| 167 |
+
|
| 168 |
+
</details>
|
| 169 |
+
|
| 170 |
+
<details>
|
| 171 |
+
<summary>Comparison: Qwen 3.6 template</summary>
|
| 172 |
+
|
| 173 |
+
| Feature | Official | **This** |
|
| 174 |
+
|---------|----------|----------|
|
| 175 |
+
| Tool arguments | Fails (`\|items`) | **Fixed** |
|
| 176 |
+
| `\|safe` removed | Fails | **Fixed** |
|
| 177 |
+
| `developer` role | Missing | **Added** |
|
| 178 |
+
| Thinking toggle | None | **`<\|think_off\|>` anywhere** |
|
| 179 |
+
| `preserve_thinking` | Spams empty blocks | **Dynamic length checks** |
|
| 180 |
+
| Mid-conversation system | Crashes | **Fixed** |
|
| 181 |
+
| `</thinking>` hallucination | Fails | **Detected and handled** |
|
| 182 |
+
| Interrupted streams | Broken tags | **Rescued** |
|
| 183 |
+
| Auto-close thinking before tool | Not handled | **Auto-injects close tag** |
|
| 184 |
+
| No-user-query crash | Crashes | **Graceful fallback** |
|
| 185 |
+
| Dynamic tool format | Static | **Yes** |
|
| 186 |
+
|
| 187 |
+
</details>
|
| 188 |
+
|
| 189 |
+
---
|
| 190 |
+
|
| 191 |
+
## Authorship
|
| 192 |
+
|
| 193 |
+
| Role | Author |
|
| 194 |
+
|------|--------|
|
| 195 |
+
| Original models | Alibaba Cloud (Qwen team) |
|
| 196 |
+
| Template fixes | [froggeric](https://huggingface.co/froggeric) |
|
| 197 |
+
|
| 198 |
+
## License
|
| 199 |
+
|
| 200 |
+
Apache-2.0, inherited from Qwen.
|