Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "StableQuant/Qwen-Templates-Rebuild-Project" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
- SGLang
How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
Upload 2 files
Browse files- v1.0_rebuild_qwen3.5_and_3.6_template.jinja +229 -0
- v1.0_writeup.md +646 -0
v1.0_rebuild_qwen3.5_and_3.6_template.jinja
ADDED
|
@@ -0,0 +1,229 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{#- ===== SECTION 1: MACRO render_content =====
|
| 2 |
+
Handles string, list (image/video/text items), or None/undefined.
|
| 3 |
+
count_vision=true: increments ns.image_count / ns.video_count.
|
| 4 |
+
-#}
|
| 5 |
+
{%- macro render_content(content, count_vision=false) -%}
|
| 6 |
+
{%- if content is string -%}
|
| 7 |
+
{{- content -}}
|
| 8 |
+
{%- elif content is iterable and content is not mapping -%}
|
| 9 |
+
{%- for item in content -%}
|
| 10 |
+
{%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
|
| 11 |
+
{%- if count_vision -%}{%- set ns.image_count = ns.image_count + 1 -%}{%- endif -%}
|
| 12 |
+
{%- if add_vision_id is defined and add_vision_id -%}
|
| 13 |
+
{{- 'Picture ' ~ ns.image_count ~ ': ' -}}
|
| 14 |
+
{%- endif -%}
|
| 15 |
+
{{- '<|vision_start|><|image_pad|><|vision_end|>' -}}
|
| 16 |
+
{%- elif item.type == 'video' or 'video' in item -%}
|
| 17 |
+
{%- if count_vision -%}{%- set ns.video_count = ns.video_count + 1 -%}{%- endif -%}
|
| 18 |
+
{%- if add_vision_id is defined and add_vision_id -%}
|
| 19 |
+
{{- 'Video ' ~ ns.video_count ~ ': ' -}}
|
| 20 |
+
{%- endif -%}
|
| 21 |
+
{{- '<|vision_start|><|video_pad|><|vision_end|>' -}}
|
| 22 |
+
{%- elif item.type == 'text' or 'text' in item -%}
|
| 23 |
+
{{- item.text -}}
|
| 24 |
+
{%- endif -%}
|
| 25 |
+
{%- endfor -%}
|
| 26 |
+
{%- endif -%}
|
| 27 |
+
{%- endmacro -%}
|
| 28 |
+
|
| 29 |
+
{#- ===== SECTION 2: NAMESPACE INITIALISATION =====
|
| 30 |
+
Single ns object for all mutable state.
|
| 31 |
+
enable_thinking default=true; overridden by template parameter (BUG-003 fix).
|
| 32 |
+
-#}
|
| 33 |
+
{%- set ns = namespace(
|
| 34 |
+
enable_thinking=true,
|
| 35 |
+
image_count=0,
|
| 36 |
+
video_count=0
|
| 37 |
+
) -%}
|
| 38 |
+
{%- if enable_thinking is defined -%}
|
| 39 |
+
{%- if enable_thinking -%}
|
| 40 |
+
{%- set ns.enable_thinking = true -%}
|
| 41 |
+
{%- else -%}
|
| 42 |
+
{%- set ns.enable_thinking = false -%}
|
| 43 |
+
{%- endif -%}
|
| 44 |
+
{%- endif -%}
|
| 45 |
+
|
| 46 |
+
{#- ===== SECTION 3: PRE-SCAN =====
|
| 47 |
+
Track last /no_think or /think flag in user messages.
|
| 48 |
+
The model follows the last flag encountered in multi-turn conversations.
|
| 49 |
+
-#}
|
| 50 |
+
{%- for i in range(messages | length) -%}
|
| 51 |
+
{%- if messages[i].role == 'user' -%}
|
| 52 |
+
{%- set _u = messages[i].content if messages[i].content is string else '' -%}
|
| 53 |
+
{%- if _u.rstrip().endswith('/no_think') -%}
|
| 54 |
+
{%- set ns.enable_thinking = false -%}
|
| 55 |
+
{%- elif _u.rstrip().endswith('/think') -%}
|
| 56 |
+
{%- set ns.enable_thinking = true -%}
|
| 57 |
+
{%- endif -%}
|
| 58 |
+
{%- endif -%}
|
| 59 |
+
{%- endfor -%}
|
| 60 |
+
|
| 61 |
+
{#- ===== SECTION 4: COLLECT SYSTEM CONTENT =====
|
| 62 |
+
Merge all system/developer messages with \n\n separator (BUG-004 fix).
|
| 63 |
+
-#}
|
| 64 |
+
{%- set ns_sys = namespace(content='') -%}
|
| 65 |
+
{%- for msg in messages -%}
|
| 66 |
+
{%- if msg.role == 'system' or msg.role == 'developer' -%}
|
| 67 |
+
{%- set _c = render_content(msg.content | default('')) | trim -%}
|
| 68 |
+
{%- if _c -%}
|
| 69 |
+
{%- if ns_sys.content == '' -%}
|
| 70 |
+
{%- set ns_sys.content = _c -%}
|
| 71 |
+
{%- else -%}
|
| 72 |
+
{%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
|
| 73 |
+
{%- endif -%}
|
| 74 |
+
{%- endif -%}
|
| 75 |
+
{%- endif -%}
|
| 76 |
+
{%- endfor -%}
|
| 77 |
+
|
| 78 |
+
{#- ===== SECTION 5: BUILD TOOLS LIST =====
|
| 79 |
+
Normalise each tool to {"type":"function","function":{...}} format.
|
| 80 |
+
Serialisation happens later at output time (avoids Markup + str escaping bugs).
|
| 81 |
+
-#}
|
| 82 |
+
{%- set _has_tools = tools is defined and tools -%}
|
| 83 |
+
{%- if _has_tools -%}
|
| 84 |
+
{%- set ns_tb = namespace(list=[]) -%}
|
| 85 |
+
{%- for tool in tools -%}
|
| 86 |
+
{%- if tool.function is defined -%}
|
| 87 |
+
{%- set ns_tb.list = ns_tb.list + [tool] -%}
|
| 88 |
+
{%- else -%}
|
| 89 |
+
{%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
|
| 90 |
+
{%- endif -%}
|
| 91 |
+
{%- endfor -%}
|
| 92 |
+
{%- endif -%}
|
| 93 |
+
|
| 94 |
+
{#- ===== SECTION 6: OUTPUT SYSTEM TURN =====
|
| 95 |
+
Each fragment output via its own {{ }} block so tojson Markup objects are
|
| 96 |
+
never Python-concatenated with plain strings (would trigger HTML-escaping).
|
| 97 |
+
User system content appears BEFORE the tools block (correct ordering).
|
| 98 |
+
No default system prompt injected.
|
| 99 |
+
-#}
|
| 100 |
+
{%- if ns_sys.content or _has_tools -%}
|
| 101 |
+
{{- '<|im_start|>system\n' -}}
|
| 102 |
+
{%- if ns_sys.content -%}
|
| 103 |
+
{{- ns_sys.content -}}
|
| 104 |
+
{%- if _has_tools -%}{{- '\n\n' -}}{%- endif -%}
|
| 105 |
+
{%- endif -%}
|
| 106 |
+
{%- if _has_tools -%}
|
| 107 |
+
{{- '# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n' -}}
|
| 108 |
+
{%- for tool in ns_tb.list -%}
|
| 109 |
+
{{- tool | tojson -}}
|
| 110 |
+
{%- if not loop.last -%}{{- '\n' -}}{%- endif -%}
|
| 111 |
+
{%- endfor -%}
|
| 112 |
+
{{- '\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call>' -}}
|
| 113 |
+
{%- endif -%}
|
| 114 |
+
{{- '<|im_end|>\n' -}}
|
| 115 |
+
{%- endif -%}
|
| 116 |
+
|
| 117 |
+
{#- ===== SECTION 7: MAIN MESSAGE LOOP ===== -#}
|
| 118 |
+
{%- for message in messages -%}
|
| 119 |
+
|
| 120 |
+
{#- 7a: System / Developer β already rendered above, skip -#}
|
| 121 |
+
{%- if message.role == 'system' or message.role == 'developer' -%}
|
| 122 |
+
|
| 123 |
+
{#- 7b: User messages -#}
|
| 124 |
+
{%- elif message.role == 'user' -%}
|
| 125 |
+
{%- set _uc = render_content(message.content | default(''), true) -%}
|
| 126 |
+
{{- '<|im_start|>user\n' + _uc + '<|im_end|>\n' -}}
|
| 127 |
+
|
| 128 |
+
{#- 7c: Assistant messages -#}
|
| 129 |
+
{%- elif message.role == 'assistant' -%}
|
| 130 |
+
{#- Safely extract content as string β guard against absent key (BUG-002 fix) -#}
|
| 131 |
+
{%- if message.content is defined and message.content is string -%}
|
| 132 |
+
{%- set _ac = message.content -%}
|
| 133 |
+
{%- elif message.content is defined and message.content is iterable and message.content is not mapping -%}
|
| 134 |
+
{%- set _ac = render_content(message.content) -%}
|
| 135 |
+
{%- else -%}
|
| 136 |
+
{%- set _ac = '' -%}
|
| 137 |
+
{%- endif -%}
|
| 138 |
+
|
| 139 |
+
{#- Collect tool_calls if present -#}
|
| 140 |
+
{%- set _tc = message.tool_calls if message.tool_calls is defined and message.tool_calls else [] -%}
|
| 141 |
+
|
| 142 |
+
{#- Strip <tool_call> prefix from content when tool_calls also present
|
| 143 |
+
(some frameworks duplicate the data in both fields) -#}
|
| 144 |
+
{%- if _tc and '<tool_call>' in _ac -%}
|
| 145 |
+
{%- set _ac = _ac.split('<tool_call>')[0] | trim -%}
|
| 146 |
+
{%- endif -%}
|
| 147 |
+
|
| 148 |
+
{#- Determine if this is the last-in-history assistant turn.
|
| 149 |
+
When add_generation_prompt=False and this is the last message, think blocks
|
| 150 |
+
must be preserved (and non-thinking prefill applied if needed).
|
| 151 |
+
All other turns have their think blocks stripped. -#}
|
| 152 |
+
{%- set _is_last_hist = loop.last and not (add_generation_prompt | default(false)) -%}
|
| 153 |
+
|
| 154 |
+
{#- Think-block handling (BUG-001 fix + last-turn preservation):
|
| 155 |
+
- Tool-call turns : never strip (think block is part of the tool-call format)
|
| 156 |
+
- Last-history turn : preserve; inject non-thinking prefill when absent
|
| 157 |
+
- Historical turns : strip the think block -#}
|
| 158 |
+
{%- if not _tc -%}
|
| 159 |
+
{%- if _is_last_hist -%}
|
| 160 |
+
{%- if '<think>' not in _ac and not ns.enable_thinking -%}
|
| 161 |
+
{%- set _ac = '<think>\n\n</think>\n\n' + _ac -%}
|
| 162 |
+
{%- endif -%}
|
| 163 |
+
{%- else -%}
|
| 164 |
+
{%- if '</think>' in _ac -%}
|
| 165 |
+
{%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
|
| 166 |
+
{%- endif -%}
|
| 167 |
+
{%- endif -%}
|
| 168 |
+
{%- endif -%}
|
| 169 |
+
|
| 170 |
+
{#- Emit the assistant turn -#}
|
| 171 |
+
{{- '<|im_start|>assistant\n' -}}
|
| 172 |
+
{%- if _ac -%}
|
| 173 |
+
{{- _ac -}}
|
| 174 |
+
{%- if _tc -%}{{- '\n' -}}{%- endif -%}
|
| 175 |
+
{%- endif -%}
|
| 176 |
+
|
| 177 |
+
{#- Render tool calls in Hermes format (BUG-006 fix: arguments as-is or tojson).
|
| 178 |
+
Each value output via its own {{ }} block β never concatenated with plain strings
|
| 179 |
+
in Python, which would trigger Markup HTML-escaping (BUG-003/markup fix). -#}
|
| 180 |
+
{%- if _tc -%}
|
| 181 |
+
{%- for tc in _tc -%}
|
| 182 |
+
{{- '<tool_call>\n' -}}
|
| 183 |
+
{{- '{"name": ' -}}{{- tc.function.name | tojson -}}
|
| 184 |
+
{%- if tc.function.arguments is string -%}
|
| 185 |
+
{{- ', "arguments": ' + tc.function.arguments -}}
|
| 186 |
+
{%- else -%}
|
| 187 |
+
{{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
|
| 188 |
+
{%- endif -%}
|
| 189 |
+
{{- '}' -}}
|
| 190 |
+
{%- if not loop.last -%}
|
| 191 |
+
{{- '\n</tool_call>\n' -}}
|
| 192 |
+
{%- else -%}
|
| 193 |
+
{{- '\n</tool_call>' -}}
|
| 194 |
+
{%- endif -%}
|
| 195 |
+
{%- endfor -%}
|
| 196 |
+
{%- endif -%}
|
| 197 |
+
{{- '<|im_end|>\n' -}}
|
| 198 |
+
|
| 199 |
+
{#- 7d: Tool results β group consecutive tool messages into one user turn -#}
|
| 200 |
+
{%- elif message.role == 'tool' -%}
|
| 201 |
+
{%- set _prev_role = messages[loop.index0 - 1].role if loop.index0 > 0 else '' -%}
|
| 202 |
+
{%- set _next_role = messages[loop.index0 + 1].role if not loop.last else '' -%}
|
| 203 |
+
{%- if _prev_role != 'tool' -%}
|
| 204 |
+
{{- '<|im_start|>user\n' -}}
|
| 205 |
+
{%- endif -%}
|
| 206 |
+
{{- '<tool_response>\n' -}}
|
| 207 |
+
{{- message.content | default('') -}}
|
| 208 |
+
{{- '\n</tool_response>' -}}
|
| 209 |
+
{%- if _next_role != 'tool' -%}
|
| 210 |
+
{{- '<|im_end|>\n' -}}
|
| 211 |
+
{%- endif -%}
|
| 212 |
+
|
| 213 |
+
{#- 7e: Unknown role -#}
|
| 214 |
+
{%- else -%}
|
| 215 |
+
{{- raise_exception('Unexpected message role: ' + message.role) -}}
|
| 216 |
+
{%- endif -%}
|
| 217 |
+
|
| 218 |
+
{%- endfor -%}
|
| 219 |
+
|
| 220 |
+
{#- ===== SECTION 8: GENERATION PROMPT =====
|
| 221 |
+
enable_thinking=True β no prefill (model generates <think> itself)
|
| 222 |
+
enable_thinking=False β exact 19-char non-thinking prefill (BUG-005 fix)
|
| 223 |
+
-#}
|
| 224 |
+
{%- if add_generation_prompt -%}
|
| 225 |
+
{{- '<|im_start|>assistant\n' -}}
|
| 226 |
+
{%- if not ns.enable_thinking -%}
|
| 227 |
+
{{- '<think>\n\n</think>\n\n' -}}
|
| 228 |
+
{%- endif -%}
|
| 229 |
+
{%- endif -%}
|
v1.0_writeup.md
ADDED
|
@@ -0,0 +1,646 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Qwen3.5 / Qwen3.6 Jinja2 Chat Template β Implementation Writeup
|
| 2 |
+
|
| 3 |
+
**File:** `qwen3_5-template.jinja`
|
| 4 |
+
**Validation:** `validate_template.py` (17 fixtures, 0 failures)
|
| 5 |
+
**Bugs fixed:** BUG-001 through BUG-006
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Table of Contents
|
| 10 |
+
|
| 11 |
+
1. [Why a New Template?](#1-why-a-new-template)
|
| 12 |
+
2. [Research Basis](#2-research-basis)
|
| 13 |
+
3. [Model Format Fundamentals](#3-model-format-fundamentals)
|
| 14 |
+
4. [Implementation Premises](#4-implementation-premises)
|
| 15 |
+
5. [enable_thinking Behavior](#5-enable_thinking-behavior)
|
| 16 |
+
6. [Tool Call Rendering](#6-tool-call-rendering)
|
| 17 |
+
7. [Bug Analysis and Fixes](#7-bug-analysis-and-fixes)
|
| 18 |
+
8. [Template Architecture](#8-template-architecture)
|
| 19 |
+
9. [Test Coverage](#9-test-coverage)
|
| 20 |
+
10. [Tool Ecosystem Compatibility](#10-tool-ecosystem-compatibility)
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## 1. Why a New Template?
|
| 25 |
+
|
| 26 |
+
The official Qwen3.5/3.6 chat template (as shipped with the HuggingFace model
|
| 27 |
+
checkpoints) contains at least six correctness bugs that cause silent failures in
|
| 28 |
+
production agent loops. These bugs were independently reported across GitHub
|
| 29 |
+
issues, HuggingFace discussions, Reddit threads, and llama.cpp/vLLM bug trackers
|
| 30 |
+
between early 2025 and mid-2026.
|
| 31 |
+
|
| 32 |
+
An analysis of approximately five widely-used community replacement templates
|
| 33 |
+
showed that each one fixed a different subset of the bugs while introducing new
|
| 34 |
+
ones. None were derived systematically from the model's training format as
|
| 35 |
+
documented in the official technical report.
|
| 36 |
+
|
| 37 |
+
This template was written from scratch, grounded in:
|
| 38 |
+
|
| 39 |
+
- **Qwen3 Technical Report** (arXiv:2505.09388) β authoritative description of
|
| 40 |
+
the training format, thinking mechanism, and tool-calling protocol.
|
| 41 |
+
- **Mid-Think Paper** (arXiv:2601.07036) β phase structure of reasoning chains and
|
| 42 |
+
budget-stop format.
|
| 43 |
+
- **Hermes tool-call format spec** (Nous Research / NousHermes) β the XML-based
|
| 44 |
+
tool-call format on which Qwen3 tool-calling is modelled.
|
| 45 |
+
- Community bug reports and vLLM/llama.cpp/Ollama source code analysis.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## 2. Research Basis
|
| 50 |
+
|
| 51 |
+
### 2.1 Qwen3 Technical Report (arXiv:2505.09388)
|
| 52 |
+
|
| 53 |
+
Key facts extracted for template construction:
|
| 54 |
+
|
| 55 |
+
- No BOS token. The model was trained without one; inserting one degrades output.
|
| 56 |
+
- `<think>` and `</think>` are **regular BPE text tokens**, not special tokens.
|
| 57 |
+
Tokenizer ID 151644 = `<|im_start|>`, 151645 = `<|im_end|>`.
|
| 58 |
+
- Non-thinking mode is implemented by prepending an **empty think block** to the
|
| 59 |
+
assistant generation: `<think>\n\n</think>\n\n`. The report states explicitly:
|
| 60 |
+
*"For non-thinking mode samples, we retain an empty thinking block in the
|
| 61 |
+
assistant's response. This design ensures internal format consistency."*
|
| 62 |
+
- `/think` and `/no_think` are plain text suffixes in user messages, not special
|
| 63 |
+
tokens. The model was fine-tuned to follow the **last** such flag encountered in
|
| 64 |
+
a multi-turn conversation.
|
| 65 |
+
|
| 66 |
+
### 2.2 Vocab and Tokenizer Notes
|
| 67 |
+
|
| 68 |
+
```
|
| 69 |
+
Token ID Note
|
| 70 |
+
<|endoftext|> 151643 End-of-document / pad fallback
|
| 71 |
+
<|im_start|> 151644 Begin-of-turn
|
| 72 |
+
<|im_end|> 151645 End-of-turn, eos_token
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Qwen3.5/3.6 both use a padded vocabulary of 248,320 entries; tokens above 151,646
|
| 76 |
+
are padding with no semantics. The tokenizer class is `Qwen2Tokenizer` (BBPE,
|
| 77 |
+
no `<unk>`).
|
| 78 |
+
|
| 79 |
+
### 2.3 Tool-Call Format Origin
|
| 80 |
+
|
| 81 |
+
Qwen3 tool-calling uses the **Hermes-2 XML format** (NousResearch):
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
<tool_call>
|
| 85 |
+
{"name": "function_name", "arguments": {"key": "value"}}
|
| 86 |
+
</tool_call>
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
This is identical to vLLM's `hermes` parser target and is the format recognised
|
| 90 |
+
by Ollama's `parseTag()` heuristic (first text node following `.ToolCalls`).
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## 3. Model Format Fundamentals
|
| 95 |
+
|
| 96 |
+
### 3.1 ChatML Base Structure
|
| 97 |
+
|
| 98 |
+
Every conversation is encoded as a sequence of turns delimited by im-start/end
|
| 99 |
+
control tokens. No newline appears before `<|im_end|>`.
|
| 100 |
+
|
| 101 |
+
```
|
| 102 |
+
<|im_start|>system
|
| 103 |
+
{system_content}<|im_end|>
|
| 104 |
+
<|im_start|>user
|
| 105 |
+
{user_content}<|im_end|>
|
| 106 |
+
<|im_start|>assistant
|
| 107 |
+
<think>
|
| 108 |
+
{thinking}
|
| 109 |
+
</think>
|
| 110 |
+
|
| 111 |
+
{response}<|im_end|>
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
The blank line between `</think>` and the response is mandatory. The model was
|
| 115 |
+
trained on this exact whitespace layout.
|
| 116 |
+
|
| 117 |
+
### 3.2 Non-Thinking Prefill (Character-Exact)
|
| 118 |
+
|
| 119 |
+
The non-thinking generation prefix is exactly 19 characters:
|
| 120 |
+
|
| 121 |
+
```
|
| 122 |
+
<think>\n\n</think>\n\n
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
Decomposed: `<think>` (7) + `\n` (1) + `\n` (1) + `</think>` (8) + `\n` (1) +
|
| 126 |
+
`\n` (1) = 19. Any deviation (extra space, missing newline) moves the model off
|
| 127 |
+
its training distribution.
|
| 128 |
+
|
| 129 |
+
### 3.3 Think-Block Scope Rules
|
| 130 |
+
|
| 131 |
+
| Turn type | Think-block treatment |
|
| 132 |
+
|---|---|
|
| 133 |
+
| Historical assistant turn (non-last, no tool_calls) | **Strip entirely** β `split('</think>')[-1].lstrip('\n')` |
|
| 134 |
+
| Historical assistant turn (has tool_calls) | **Preserve** β think block is part of the tool-call format |
|
| 135 |
+
| Last assistant turn in history (`add_generation_prompt=False`) | **Preserve verbatim** |
|
| 136 |
+
| Last assistant turn, no existing think, `enable_thinking=False` | **Inject** `<think>\n\n</think>\n\n` prefix |
|
| 137 |
+
| Generation prompt, `enable_thinking=True` | **No prefix** β model generates its own `<think>` |
|
| 138 |
+
| Generation prompt, `enable_thinking=False` | **Inject** `<think>\n\n</think>\n\n` prefix |
|
| 139 |
+
|
| 140 |
+
---
|
| 141 |
+
|
| 142 |
+
## 4. Implementation Premises
|
| 143 |
+
|
| 144 |
+
### 4.1 Single Namespace Object
|
| 145 |
+
|
| 146 |
+
All mutable template state lives in one `ns` namespace object, avoiding
|
| 147 |
+
Jinja2's scoping trap (variables set inside `{% for %}` blocks are not visible
|
| 148 |
+
outside without a namespace):
|
| 149 |
+
|
| 150 |
+
```jinja2
|
| 151 |
+
{%- set ns = namespace(
|
| 152 |
+
enable_thinking=true,
|
| 153 |
+
image_count=0,
|
| 154 |
+
video_count=0
|
| 155 |
+
) -%}
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
### 4.2 Pre-Scan Before Rendering
|
| 159 |
+
|
| 160 |
+
The template performs a full pre-scan of all messages before emitting any output.
|
| 161 |
+
This is necessary because `/no_think` or `/think` can appear in any user message,
|
| 162 |
+
and the final flag determines the generation prompt behaviour. A single-pass loop
|
| 163 |
+
that both renders and tracks flags would have to look ahead, which Jinja2 cannot
|
| 164 |
+
do.
|
| 165 |
+
|
| 166 |
+
```jinja2
|
| 167 |
+
{%- for i in range(messages | length) -%}
|
| 168 |
+
{%- if messages[i].role == 'user' -%}
|
| 169 |
+
{%- set _u = messages[i].content if messages[i].content is string else '' -%}
|
| 170 |
+
{%- if _u.rstrip().endswith('/no_think') -%}
|
| 171 |
+
{%- set ns.enable_thinking = false -%}
|
| 172 |
+
{%- elif _u.rstrip().endswith('/think') -%}
|
| 173 |
+
{%- set ns.enable_thinking = true -%}
|
| 174 |
+
{%- endif -%}
|
| 175 |
+
{%- endif -%}
|
| 176 |
+
{%- endfor -%}
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
### 4.3 Separate `{{ }}` Blocks for `tojson` Output
|
| 180 |
+
|
| 181 |
+
Jinja2's `tojson` filter returns a `Markup` object (already HTML-safe). When a
|
| 182 |
+
`Markup` value is Python-concatenated with a plain string using `+`, Jinja2
|
| 183 |
+
auto-escapes the plain string and produces double-encoded output (`"`,
|
| 184 |
+
`"`, etc.). This is BUG-003.
|
| 185 |
+
|
| 186 |
+
The fix is to never concatenate `tojson` output with plain strings inside a
|
| 187 |
+
Jinja2 expression. Each fragment is emitted through its own `{{ }}` block:
|
| 188 |
+
|
| 189 |
+
```jinja2
|
| 190 |
+
{# WRONG β triggers HTML-escaping of the plain string #}
|
| 191 |
+
{{- '{"name": ' + tc.function.name | tojson + '}' -}}
|
| 192 |
+
|
| 193 |
+
{# CORRECT β separate blocks, no Python concatenation #}
|
| 194 |
+
{{- '{"name": ' -}}{{- tc.function.name | tojson -}}{{- '}' -}}
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
### 4.4 System Message Collection Phase
|
| 198 |
+
|
| 199 |
+
Multiple system messages are merged into a single `<|im_start|>system` turn
|
| 200 |
+
with `\n\n` as separator (BUG-004 fix). This is done as a separate pre-pass
|
| 201 |
+
(Section 4 in the template), so the main loop can unconditionally skip all
|
| 202 |
+
`role == 'system'` messages.
|
| 203 |
+
|
| 204 |
+
The user's system content always appears **before** the tools block in the
|
| 205 |
+
system turn, matching the training format.
|
| 206 |
+
|
| 207 |
+
### 4.5 Tool Normalisation
|
| 208 |
+
|
| 209 |
+
Some frameworks pass tool definitions with a top-level `function` key
|
| 210 |
+
(`{"type": "function", "function": {...}}`), while others pass the function
|
| 211 |
+
schema directly (`{"name": ..., "parameters": ...}`). The template normalises
|
| 212 |
+
all entries to the canonical form before serialisation:
|
| 213 |
+
|
| 214 |
+
```jinja2
|
| 215 |
+
{%- if tool.function is defined -%}
|
| 216 |
+
{%- set ns_tb.list = ns_tb.list + [tool] -%}
|
| 217 |
+
{%- else -%}
|
| 218 |
+
{%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
|
| 219 |
+
{%- endif -%}
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
---
|
| 223 |
+
|
| 224 |
+
## 5. `enable_thinking` Behavior
|
| 225 |
+
|
| 226 |
+
### 5.1 Resolution Priority (Highest to Lowest)
|
| 227 |
+
|
| 228 |
+
1. **`/no_think` or `/think` text suffix** in the last user message that contains
|
| 229 |
+
one. This is the highest priority because it represents the most recent
|
| 230 |
+
explicit user intent and mirrors the model's fine-tuning data.
|
| 231 |
+
2. **`enable_thinking` template variable** passed at render time (e.g., via
|
| 232 |
+
`tokenizer.apply_chat_template(..., enable_thinking=False)`).
|
| 233 |
+
3. **Default value** of `true` (thinking on by default, consistent with the model's
|
| 234 |
+
training distribution).
|
| 235 |
+
|
| 236 |
+
### 5.2 Generation Prompt Behaviour
|
| 237 |
+
|
| 238 |
+
When `add_generation_prompt=True`:
|
| 239 |
+
|
| 240 |
+
```
|
| 241 |
+
enable_thinking=True β <|im_start|>assistant\n
|
| 242 |
+
(model generates <think> itself)
|
| 243 |
+
|
| 244 |
+
enable_thinking=False β <|im_start|>assistant\n<think>\n\n</think>\n\n
|
| 245 |
+
(forces non-thinking mode by pre-filling empty block)
|
| 246 |
+
```
|
| 247 |
+
|
| 248 |
+
### 5.3 Last-History-Turn Behaviour (add_generation_prompt=False)
|
| 249 |
+
|
| 250 |
+
When the conversation ends with an assistant message and no generation prompt
|
| 251 |
+
is requested β typical when scoring a complete conversation or when the
|
| 252 |
+
assistant message is being appended to the prompt for continuation:
|
| 253 |
+
|
| 254 |
+
- **Think block present:** preserved verbatim regardless of `enable_thinking`.
|
| 255 |
+
- **No think block, `enable_thinking=True`:** content left as-is (historical turns
|
| 256 |
+
are already stripped; the last one is the current generation context).
|
| 257 |
+
- **No think block, `enable_thinking=False`:** inject `<think>\n\n</think>\n\n`
|
| 258 |
+
before the content.
|
| 259 |
+
|
| 260 |
+
### 5.4 Historical Think-Block Stripping (BUG-001)
|
| 261 |
+
|
| 262 |
+
The official template collapses think blocks in historical turns to
|
| 263 |
+
`<think>\n\n</think>` instead of removing them. In a long agentic loop this
|
| 264 |
+
produces an ever-growing sequence of empty think blocks that degrades prompt
|
| 265 |
+
quality ("prompt poisoning").
|
| 266 |
+
|
| 267 |
+
The correct operation is full removal:
|
| 268 |
+
|
| 269 |
+
```python
|
| 270 |
+
# Python equivalent
|
| 271 |
+
content = content.split('</think>')[-1].lstrip('\n') if '</think>' in content else content
|
| 272 |
+
```
|
| 273 |
+
|
| 274 |
+
```jinja2
|
| 275 |
+
{# Jinja2 equivalent #}
|
| 276 |
+
{%- if '</think>' in _ac -%}
|
| 277 |
+
{%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
|
| 278 |
+
{%- endif -%}
|
| 279 |
+
```
|
| 280 |
+
|
| 281 |
+
**Exception:** turns that also carry `tool_calls` keep their think block intact.
|
| 282 |
+
The model is trained to produce thinking before tool invocations, and stripping
|
| 283 |
+
the think block from a historical tool-call turn would misrepresent the prompt.
|
| 284 |
+
|
| 285 |
+
---
|
| 286 |
+
|
| 287 |
+
## 6. Tool Call Rendering
|
| 288 |
+
|
| 289 |
+
### 6.1 System Turn Tool Block Format
|
| 290 |
+
|
| 291 |
+
The exact text injected into the system message when tools are present matches
|
| 292 |
+
the Qwen3 Hermes training format:
|
| 293 |
+
|
| 294 |
+
```
|
| 295 |
+
# Tools
|
| 296 |
+
|
| 297 |
+
You may call one or more functions to assist with the user query.
|
| 298 |
+
|
| 299 |
+
You are provided with function signatures within <tools></tools> XML tags:
|
| 300 |
+
<tools>
|
| 301 |
+
{"type": "function", "function": {"name": "...", ...}}
|
| 302 |
+
</tools>
|
| 303 |
+
|
| 304 |
+
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
|
| 305 |
+
<tool_call>
|
| 306 |
+
{"name": <function-name>, "arguments": <args-json-object>}
|
| 307 |
+
</tool_call>
|
| 308 |
+
```
|
| 309 |
+
|
| 310 |
+
All text β including the instruction sentences β is literal and must not be
|
| 311 |
+
modified. The model was trained on this exact phrasing.
|
| 312 |
+
|
| 313 |
+
### 6.2 Assistant Tool-Call Block
|
| 314 |
+
|
| 315 |
+
Each tool call is rendered as:
|
| 316 |
+
|
| 317 |
+
```
|
| 318 |
+
<tool_call>
|
| 319 |
+
{"name": "function_name", "arguments": {JSON_OBJECT}}
|
| 320 |
+
</tool_call>
|
| 321 |
+
```
|
| 322 |
+
|
| 323 |
+
Multiple parallel calls appear as consecutive blocks separated by `\n`:
|
| 324 |
+
|
| 325 |
+
```
|
| 326 |
+
<tool_call>
|
| 327 |
+
{"name": "f1", "arguments": {...}}
|
| 328 |
+
</tool_call>
|
| 329 |
+
<tool_call>
|
| 330 |
+
{"name": "f2", "arguments": {...}}
|
| 331 |
+
</tool_call><|im_end|>
|
| 332 |
+
```
|
| 333 |
+
|
| 334 |
+
Note: the final `</tool_call>` is immediately followed by `<|im_end|>` with no
|
| 335 |
+
intervening newline. This matches the training format.
|
| 336 |
+
|
| 337 |
+
### 6.3 Arguments: String vs Object (BUG-006)
|
| 338 |
+
|
| 339 |
+
Some frameworks (notably older OpenAI-compatible clients and some streaming
|
| 340 |
+
implementations) serialise tool-call arguments as a JSON string
|
| 341 |
+
(`"{\"location\": \"Berlin\"}"`) rather than as an object
|
| 342 |
+
(`{"location": "Berlin"}`). The template handles both:
|
| 343 |
+
|
| 344 |
+
```jinja2
|
| 345 |
+
{%- if tc.function.arguments is string -%}
|
| 346 |
+
{{- ', "arguments": ' + tc.function.arguments -}}
|
| 347 |
+
{%- else -%}
|
| 348 |
+
{{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
|
| 349 |
+
{%- endif -%}
|
| 350 |
+
```
|
| 351 |
+
|
| 352 |
+
When arguments are already a string they are passed through as-is (the caller
|
| 353 |
+
is responsible for valid JSON). When they are a dict/object, `tojson` serialises
|
| 354 |
+
them correctly including Unicode escaping and quote escaping.
|
| 355 |
+
|
| 356 |
+
This arrangement also prevents the `"""` crash (BUG-006): Python triple-quoted
|
| 357 |
+
strings inside Jinja2 template strings would crash the Jinja2 parser if the
|
| 358 |
+
arguments dict happened to contain a value like `"""`. By using `tojson`
|
| 359 |
+
(which produces a proper JSON string literal) the crash cannot occur.
|
| 360 |
+
|
| 361 |
+
### 6.4 Tool Results
|
| 362 |
+
|
| 363 |
+
Tool results are wrapped in a user turn using `<tool_response>`:
|
| 364 |
+
|
| 365 |
+
```
|
| 366 |
+
<|im_start|>user
|
| 367 |
+
<tool_response>
|
| 368 |
+
{result_content}
|
| 369 |
+
</tool_response><|im_end|>
|
| 370 |
+
```
|
| 371 |
+
|
| 372 |
+
Consecutive tool-response messages are merged into a single user turn β the
|
| 373 |
+
template checks whether the previous message's role was also `tool` and
|
| 374 |
+
suppresses the `<|im_start|>user\n` header if so.
|
| 375 |
+
|
| 376 |
+
---
|
| 377 |
+
|
| 378 |
+
## 7. Bug Analysis and Fixes
|
| 379 |
+
|
| 380 |
+
### BUG-001 β Historical Think Blocks Leaked (CRITICAL)
|
| 381 |
+
|
| 382 |
+
**Symptom:** In multi-turn conversations with `enable_thinking=True`, every
|
| 383 |
+
historical assistant message retains a collapsed `<think>\n\n</think>` block.
|
| 384 |
+
Over many turns the prompt accumulates dozens of empty think blocks, degrading
|
| 385 |
+
model performance.
|
| 386 |
+
|
| 387 |
+
**Root cause:** Official template strips think content but leaves the surrounding
|
| 388 |
+
`<think>` tags.
|
| 389 |
+
|
| 390 |
+
**Fix:** Strip the entire block by splitting on `</think>` and taking the tail:
|
| 391 |
+
|
| 392 |
+
```jinja2
|
| 393 |
+
{%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
|
| 394 |
+
```
|
| 395 |
+
|
| 396 |
+
**Tests:** T10, T13, T16
|
| 397 |
+
|
| 398 |
+
---
|
| 399 |
+
|
| 400 |
+
### BUG-002 β KeyError on content=None / Missing content Key (HIGH)
|
| 401 |
+
|
| 402 |
+
**Symptom:** When an assistant message contains only `tool_calls` and no `content`
|
| 403 |
+
(or `content=None`, which is the OpenAI convention for pure tool-call responses),
|
| 404 |
+
the template throws `UndefinedError` or `KeyError`.
|
| 405 |
+
|
| 406 |
+
**Root cause:** Official template accesses `message.content` directly.
|
| 407 |
+
|
| 408 |
+
**Fix:** Guard the access:
|
| 409 |
+
|
| 410 |
+
```jinja2
|
| 411 |
+
{%- if message.content is defined and message.content is string -%}
|
| 412 |
+
{%- set _ac = message.content -%}
|
| 413 |
+
{%- elif message.content is defined and message.content is iterable ... -%}
|
| 414 |
+
{%- set _ac = render_content(message.content) -%}
|
| 415 |
+
{%- else -%}
|
| 416 |
+
{%- set _ac = '' -%}
|
| 417 |
+
{%- endif -%}
|
| 418 |
+
```
|
| 419 |
+
|
| 420 |
+
**Tests:** T04, T11
|
| 421 |
+
|
| 422 |
+
---
|
| 423 |
+
|
| 424 |
+
### BUG-003 β Markup HTML-Escaping in Tool JSON (MEDIUM)
|
| 425 |
+
|
| 426 |
+
**Symptom:** Tool definitions or tool-call arguments with characters like `<`, `>`,
|
| 427 |
+
`&`, or `"` appear HTML-escaped in the rendered prompt (`<`, `>`, `&`,
|
| 428 |
+
`"`). This causes the model to misread the tool schema.
|
| 429 |
+
|
| 430 |
+
**Root cause:** `tojson` returns a Jinja2 `Markup` object. When `Markup` is
|
| 431 |
+
concatenated with a plain Python string using `+` inside a Jinja2 expression,
|
| 432 |
+
the plain string is auto-escaped and then concatenated with the already-safe
|
| 433 |
+
`Markup` value.
|
| 434 |
+
|
| 435 |
+
**Fix:** Never use `+` to join `tojson` output with plain strings. Emit each
|
| 436 |
+
fragment through a separate `{{ }}` block:
|
| 437 |
+
|
| 438 |
+
```jinja2
|
| 439 |
+
{# Every fragment in its own block #}
|
| 440 |
+
{{- '{"name": ' -}}{{- tc.function.name | tojson -}}
|
| 441 |
+
```
|
| 442 |
+
|
| 443 |
+
**Tests:** T03, T04, T12
|
| 444 |
+
|
| 445 |
+
---
|
| 446 |
+
|
| 447 |
+
### BUG-004 β Multiple System Messages Not Handled (MEDIUM)
|
| 448 |
+
|
| 449 |
+
**Symptom:** Frameworks such as Open WebUI send more than one `role: system`
|
| 450 |
+
message. The official template either crashes or emits multiple system turns,
|
| 451 |
+
both of which confuse the model.
|
| 452 |
+
|
| 453 |
+
**Root cause:** No merging logic for multiple system messages.
|
| 454 |
+
|
| 455 |
+
**Fix:** Pre-scan all messages and concatenate system content with `\n\n`:
|
| 456 |
+
|
| 457 |
+
```jinja2
|
| 458 |
+
{%- if ns_sys.content == '' -%}
|
| 459 |
+
{%- set ns_sys.content = _c -%}
|
| 460 |
+
{%- else -%}
|
| 461 |
+
{%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
|
| 462 |
+
{%- endif -%}
|
| 463 |
+
```
|
| 464 |
+
|
| 465 |
+
**Tests:** T02, T14
|
| 466 |
+
|
| 467 |
+
---
|
| 468 |
+
|
| 469 |
+
### BUG-005 β Wrong Non-Thinking Prefill Whitespace (LOW-MEDIUM)
|
| 470 |
+
|
| 471 |
+
**Symptom:** Non-thinking mode produces a think block with incorrect whitespace,
|
| 472 |
+
moving the model off its training distribution and causing output quality
|
| 473 |
+
degradation or refusal to honour the non-thinking instruction.
|
| 474 |
+
|
| 475 |
+
**Root cause:** The official template uses `<think>\n</think>\n\n` (missing the
|
| 476 |
+
second newline inside the block), which does not match the format described in
|
| 477 |
+
the technical report.
|
| 478 |
+
|
| 479 |
+
**Fix:** Use the exact 19-character sequence:
|
| 480 |
+
|
| 481 |
+
```
|
| 482 |
+
<think>\n\n</think>\n\n
|
| 483 |
+
```
|
| 484 |
+
|
| 485 |
+
**Tests:** T08, T17
|
| 486 |
+
|
| 487 |
+
---
|
| 488 |
+
|
| 489 |
+
### BUG-006 β Triple-Quote Crash on Python String Arguments (MEDIUM)
|
| 490 |
+
|
| 491 |
+
**Symptom:** Jinja2 raises a `TemplateSyntaxError` or produces garbled output when
|
| 492 |
+
tool-call arguments contain triple-quote sequences (`"""` or `'''`) because the
|
| 493 |
+
template previously embedded argument values using Python string literal
|
| 494 |
+
concatenation.
|
| 495 |
+
|
| 496 |
+
**Root cause:** Some community templates build the tool-call JSON via string
|
| 497 |
+
interpolation (`'{"arguments": "' + args + '"}'`), which breaks for argument
|
| 498 |
+
values containing quote characters.
|
| 499 |
+
|
| 500 |
+
**Fix:** Use `tojson` for all non-string arguments (produces well-formed JSON) and
|
| 501 |
+
pass string arguments through unchanged (caller provides valid JSON strings):
|
| 502 |
+
|
| 503 |
+
```jinja2
|
| 504 |
+
{%- if tc.function.arguments is string -%}
|
| 505 |
+
{{- ', "arguments": ' + tc.function.arguments -}}
|
| 506 |
+
{%- else -%}
|
| 507 |
+
{{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
|
| 508 |
+
{%- endif -%}
|
| 509 |
+
```
|
| 510 |
+
|
| 511 |
+
**Tests:** T12
|
| 512 |
+
|
| 513 |
+
---
|
| 514 |
+
|
| 515 |
+
## 8. Template Architecture
|
| 516 |
+
|
| 517 |
+
The template is divided into eight clearly delimited sections, each with a
|
| 518 |
+
comment header:
|
| 519 |
+
|
| 520 |
+
```
|
| 521 |
+
Section 1 render_content macro
|
| 522 |
+
Handles str / list (image/video/text) / None β plain text.
|
| 523 |
+
Increments ns.image_count / ns.video_count for vision tokens.
|
| 524 |
+
|
| 525 |
+
Section 2 Namespace initialisation
|
| 526 |
+
Single ns object; enable_thinking defaults to true.
|
| 527 |
+
|
| 528 |
+
Section 3 Pre-scan
|
| 529 |
+
Walk all user messages; last /no_think or /think wins.
|
| 530 |
+
|
| 531 |
+
Section 4 Collect system content
|
| 532 |
+
Merge all system / developer messages with \n\n.
|
| 533 |
+
|
| 534 |
+
Section 5 Build tools list
|
| 535 |
+
Normalise every tool to {"type":"function","function":{...}}.
|
| 536 |
+
|
| 537 |
+
Section 6 Output system turn
|
| 538 |
+
Emit one <|im_start|>system turn (user content + tools block).
|
| 539 |
+
|
| 540 |
+
Section 7 Main message loop
|
| 541 |
+
7a system/developer β skip (already emitted)
|
| 542 |
+
7b user β render with vision support
|
| 543 |
+
7c assistant β render with think-block logic + tool_calls
|
| 544 |
+
7d tool β group into user turns
|
| 545 |
+
7e unknown role β raise_exception
|
| 546 |
+
|
| 547 |
+
Section 8 Generation prompt
|
| 548 |
+
enable_thinking=True β bare <|im_start|>assistant\n
|
| 549 |
+
enable_thinking=False β add <think>\n\n</think>\n\n prefix
|
| 550 |
+
```
|
| 551 |
+
|
| 552 |
+
### Design Decisions
|
| 553 |
+
|
| 554 |
+
**No default system prompt.** Unlike some community templates, this template does
|
| 555 |
+
not inject a default system prompt when none is provided. The model performs well
|
| 556 |
+
without one, and injecting one would cause conflicts for applications that rely on
|
| 557 |
+
the system prompt being exactly what they set.
|
| 558 |
+
|
| 559 |
+
**No BOS token.** The Qwen3 family was trained without a BOS token. Adding one
|
| 560 |
+
would consume a context window slot unnecessarily and may harm performance.
|
| 561 |
+
|
| 562 |
+
**No `<|endoftext|>` in conversation.** This token is reserved for
|
| 563 |
+
end-of-document signalling in the pre-training phase, not for conversation
|
| 564 |
+
boundaries.
|
| 565 |
+
|
| 566 |
+
---
|
| 567 |
+
|
| 568 |
+
## 9. Test Coverage
|
| 569 |
+
|
| 570 |
+
The 17 test fixtures in `validate_template.py` cover:
|
| 571 |
+
|
| 572 |
+
| ID | Scenario | Key assertion |
|
| 573 |
+
|---|---|---|
|
| 574 |
+
| T01 | Simple user/assistant, no system, no tools | Exact ChatML output |
|
| 575 |
+
| T02 | System message | System turn before user turn |
|
| 576 |
+
| T03 | Tools defined, `enable_thinking=True` | Tools block in system; no prefill |
|
| 577 |
+
| T04 | Tool call, `content=None` | No crash; `<tool_call>` present |
|
| 578 |
+
| T05 | Parallel tool calls | `</tool_call>\n<tool_call>` separator |
|
| 579 |
+
| T06 | Tool result (role=tool) | `<|im_start|>user\n<tool_response>` |
|
| 580 |
+
| T07 | `enable_thinking=True` generation prompt | No think prefix emitted |
|
| 581 |
+
| T08 | `enable_thinking=False` generation prompt | Exact 19-char prefill |
|
| 582 |
+
| T09 | `/no_think` flag in user message | Non-thinking prefill applied |
|
| 583 |
+
| T10 | Historical think blocks | Fully stripped, not collapsed |
|
| 584 |
+
| T11 | Missing `content` key on assistant | No KeyError / UndefinedError |
|
| 585 |
+
| T12 | Special chars in arguments | Correctly JSON-escaped |
|
| 586 |
+
| T13 | Historical tool-call turn with think | Think block preserved |
|
| 587 |
+
| T14 | Multiple system messages | Merged with `\n\n`; single system turn |
|
| 588 |
+
| T15 | Parallel tool responses | Both inside single user turn |
|
| 589 |
+
| T16 | Last history turn with existing think | Preserved verbatim |
|
| 590 |
+
| T17 | Last history turn, no think, `enable_thinking=False` | Prefill injected |
|
| 591 |
+
|
| 592 |
+
Run the suite:
|
| 593 |
+
|
| 594 |
+
```bash
|
| 595 |
+
cd /workspace/project/qwen3_5-template
|
| 596 |
+
python validate_template.py
|
| 597 |
+
# Expected: 17 passed, 0 failed
|
| 598 |
+
```
|
| 599 |
+
|
| 600 |
+
---
|
| 601 |
+
|
| 602 |
+
## 10. Tool Ecosystem Compatibility
|
| 603 |
+
|
| 604 |
+
An analysis of 51 tool-calling frameworks and inference backends was conducted to
|
| 605 |
+
verify that the template's output is consumable by the broadest possible set of
|
| 606 |
+
tools. Key findings:
|
| 607 |
+
|
| 608 |
+
### 10.1 OpenAI JSON Format Dominance
|
| 609 |
+
|
| 610 |
+
31 of the 51 analysed tools use the **OpenAI-compatible JSON function-call API**
|
| 611 |
+
(Group A). These tools pass tool definitions as a `tools` array and receive tool
|
| 612 |
+
calls back as `message.tool_calls` objects. The template's input format is fully
|
| 613 |
+
compatible with this convention.
|
| 614 |
+
|
| 615 |
+
Notable Group A members: OpenHands, LangChain, LangGraph, LiteLLM, CrewAI,
|
| 616 |
+
Pydantic AI, Open WebUI, LibreChat, LM Studio, LlamaIndex, AutoGen, LiteLLM.
|
| 617 |
+
|
| 618 |
+
### 10.2 Inference Server Compatibility
|
| 619 |
+
|
| 620 |
+
| Backend | Compatibility note |
|
| 621 |
+
|---|---|
|
| 622 |
+
| **vLLM** | Uses the `hermes` tool parser for Qwen models, matching this template's `<tool_call>` format exactly. |
|
| 623 |
+
| **llama.cpp** | Recognises `<tool_call>` via the `--jinja` flag + chat template loading. Note: `--jinja` disables GBNF grammar (Issue #12204). |
|
| 624 |
+
| **Ollama** | Auto-detects the tool-call tag via `parseTag()` which reads the first text node after `.ToolCalls` in the Go template tree β `<tool_call>` is one of the three known tags. |
|
| 625 |
+
| **LM Studio** | Passes tool definitions as the `tools` API field; receives tool calls in `message.tool_calls`. |
|
| 626 |
+
| **TabbyAPI** | Full OpenAI-compatible API; correct chat template is the only requirement. |
|
| 627 |
+
|
| 628 |
+
### 10.3 Non-Native Tool-Calling Frameworks
|
| 629 |
+
|
| 630 |
+
Three framework groups (Cline/Roo Code XML, OpenCode `<parameter>`, Aider
|
| 631 |
+
SEARCH/REPLACE) do not use the OpenAI tool-calling API at all. They inject their
|
| 632 |
+
own tool descriptions into the system prompt and parse the model's text output
|
| 633 |
+
directly. These frameworks do not interact with the chat template's tool-calling
|
| 634 |
+
sections β they send no `tools` array and the template therefore emits no tool
|
| 635 |
+
block.
|
| 636 |
+
|
| 637 |
+
### 10.4 Arguments as JSON String
|
| 638 |
+
|
| 639 |
+
Several frameworks (notably some streaming clients and older OpenAI SDK versions)
|
| 640 |
+
serialise `tool_calls[].function.arguments` as a JSON string rather than a parsed
|
| 641 |
+
object. The template's dual-path arguments handling (Section 6.3) accommodates
|
| 642 |
+
both cases transparently.
|
| 643 |
+
|
| 644 |
+
---
|
| 645 |
+
|
| 646 |
+
*Generated as part of the `fix/qwen3-template-bugs` implementation.*
|