Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "StableQuant/Qwen-Templates-Rebuild-Project"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project

SGLang

How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
```
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
```

StableQuant commited on 6 days ago

Commit

1f2e278

verified ·

1 Parent(s): 28b9567

Upload 2 files

Browse files

Files changed (2) hide show

v1.0_rebuild_qwen3.5_and_3.6_template.jinja +229 -0
v1.0_writeup.md +646 -0

v1.0_rebuild_qwen3.5_and_3.6_template.jinja ADDED Viewed

	@@ -0,0 +1,229 @@

+{#- ===== SECTION 1: MACRO render_content =====
+     Handles string, list (image/video/text items), or None/undefined.
+     count_vision=true: increments ns.image_count / ns.video_count.
+-#}
+{%- macro render_content(content, count_vision=false) -%}
+  {%- if content is string -%}
+    {{- content -}}
+  {%- elif content is iterable and content is not mapping -%}
+    {%- for item in content -%}
+      {%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
+        {%- if count_vision -%}{%- set ns.image_count = ns.image_count + 1 -%}{%- endif -%}
+        {%- if add_vision_id is defined and add_vision_id -%}
+          {{- 'Picture ' ~ ns.image_count ~ ': ' -}}
+        {%- endif -%}
+        {{- '<|vision_start|><|image_pad|><|vision_end|>' -}}
+      {%- elif item.type == 'video' or 'video' in item -%}
+        {%- if count_vision -%}{%- set ns.video_count = ns.video_count + 1 -%}{%- endif -%}
+        {%- if add_vision_id is defined and add_vision_id -%}
+          {{- 'Video ' ~ ns.video_count ~ ': ' -}}
+        {%- endif -%}
+        {{- '<|vision_start|><|video_pad|><|vision_end|>' -}}
+      {%- elif item.type == 'text' or 'text' in item -%}
+        {{- item.text -}}
+      {%- endif -%}
+    {%- endfor -%}
+  {%- endif -%}
+{%- endmacro -%}
+{#- ===== SECTION 2: NAMESPACE INITIALISATION =====
+     Single ns object for all mutable state.
+     enable_thinking default=true; overridden by template parameter (BUG-003 fix).
+-#}
+{%- set ns = namespace(
+    enable_thinking=true,
+    image_count=0,
+    video_count=0
+) -%}
+{%- if enable_thinking is defined -%}
+  {%- if enable_thinking -%}
+    {%- set ns.enable_thinking = true -%}
+  {%- else -%}
+    {%- set ns.enable_thinking = false -%}
+  {%- endif -%}
+{%- endif -%}
+{#- ===== SECTION 3: PRE-SCAN =====
+     Track last /no_think or /think flag in user messages.
+     The model follows the last flag encountered in multi-turn conversations.
+-#}
+{%- for i in range(messages | length) -%}
+  {%- if messages[i].role == 'user' -%}
+    {%- set _u = messages[i].content if messages[i].content is string else '' -%}
+    {%- if _u.rstrip().endswith('/no_think') -%}
+      {%- set ns.enable_thinking = false -%}
+    {%- elif _u.rstrip().endswith('/think') -%}
+      {%- set ns.enable_thinking = true -%}
+    {%- endif -%}
+  {%- endif -%}
+{%- endfor -%}
+{#- ===== SECTION 4: COLLECT SYSTEM CONTENT =====
+     Merge all system/developer messages with \n\n separator (BUG-004 fix).
+-#}
+{%- set ns_sys = namespace(content='') -%}
+{%- for msg in messages -%}
+  {%- if msg.role == 'system' or msg.role == 'developer' -%}
+    {%- set _c = render_content(msg.content | default('')) | trim -%}
+    {%- if _c -%}
+      {%- if ns_sys.content == '' -%}
+        {%- set ns_sys.content = _c -%}
+      {%- else -%}
+        {%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
+      {%- endif -%}
+    {%- endif -%}
+  {%- endif -%}
+{%- endfor -%}
+{#- ===== SECTION 5: BUILD TOOLS LIST =====
+     Normalise each tool to {"type":"function","function":{...}} format.
+     Serialisation happens later at output time (avoids Markup + str escaping bugs).
+-#}
+{%- set _has_tools = tools is defined and tools -%}
+{%- if _has_tools -%}
+  {%- set ns_tb = namespace(list=[]) -%}
+  {%- for tool in tools -%}
+    {%- if tool.function is defined -%}
+      {%- set ns_tb.list = ns_tb.list + [tool] -%}
+    {%- else -%}
+      {%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
+    {%- endif -%}
+  {%- endfor -%}
+{%- endif -%}
+{#- ===== SECTION 6: OUTPUT SYSTEM TURN =====
+     Each fragment output via its own {{ }} block so tojson Markup objects are
+     never Python-concatenated with plain strings (would trigger HTML-escaping).
+     User system content appears BEFORE the tools block (correct ordering).
+     No default system prompt injected.
+-#}
+{%- if ns_sys.content or _has_tools -%}
+  {{- '<|im_start|>system\n' -}}
+  {%- if ns_sys.content -%}
+    {{- ns_sys.content -}}
+    {%- if _has_tools -%}{{- '\n\n' -}}{%- endif -%}
+  {%- endif -%}
+  {%- if _has_tools -%}
+    {{- '# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n' -}}
+    {%- for tool in ns_tb.list -%}
+      {{- tool | tojson -}}
+      {%- if not loop.last -%}{{- '\n' -}}{%- endif -%}
+    {%- endfor -%}
+    {{- '\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call>' -}}
+  {%- endif -%}
+  {{- '<|im_end|>\n' -}}
+{%- endif -%}
+{#- ===== SECTION 7: MAIN MESSAGE LOOP ===== -#}
+{%- for message in messages -%}
+  {#- 7a: System / Developer — already rendered above, skip -#}
+  {%- if message.role == 'system' or message.role == 'developer' -%}
+  {#- 7b: User messages -#}
+  {%- elif message.role == 'user' -%}
+    {%- set _uc = render_content(message.content | default(''), true) -%}
+    {{- '<|im_start|>user\n' + _uc + '<|im_end|>\n' -}}
+  {#- 7c: Assistant messages -#}
+  {%- elif message.role == 'assistant' -%}
+    {#- Safely extract content as string — guard against absent key (BUG-002 fix) -#}
+    {%- if message.content is defined and message.content is string -%}
+      {%- set _ac = message.content -%}
+    {%- elif message.content is defined and message.content is iterable and message.content is not mapping -%}
+      {%- set _ac = render_content(message.content) -%}
+    {%- else -%}
+      {%- set _ac = '' -%}
+    {%- endif -%}
+    {#- Collect tool_calls if present -#}
+    {%- set _tc = message.tool_calls if message.tool_calls is defined and message.tool_calls else [] -%}
+    {#- Strip <tool_call> prefix from content when tool_calls also present
+        (some frameworks duplicate the data in both fields) -#}
+    {%- if _tc and '<tool_call>' in _ac -%}
+      {%- set _ac = _ac.split('<tool_call>')[0] | trim -%}
+    {%- endif -%}
+    {#- Determine if this is the last-in-history assistant turn.
+        When add_generation_prompt=False and this is the last message, think blocks
+        must be preserved (and non-thinking prefill applied if needed).
+        All other turns have their think blocks stripped. -#}
+    {%- set _is_last_hist = loop.last and not (add_generation_prompt | default(false)) -%}
+    {#- Think-block handling (BUG-001 fix + last-turn preservation):
+        - Tool-call turns   : never strip (think block is part of the tool-call format)
+        - Last-history turn : preserve; inject non-thinking prefill when absent
+        - Historical turns  : strip the think block -#}
+    {%- if not _tc -%}
+      {%- if _is_last_hist -%}
+        {%- if '<think>' not in _ac and not ns.enable_thinking -%}
+          {%- set _ac = '<think>\n\n</think>\n\n' + _ac -%}
+        {%- endif -%}
+      {%- else -%}
+        {%- if '</think>' in _ac -%}
+          {%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
+        {%- endif -%}
+      {%- endif -%}
+    {%- endif -%}
+    {#- Emit the assistant turn -#}
+    {{- '<|im_start|>assistant\n' -}}
+    {%- if _ac -%}
+      {{- _ac -}}
+      {%- if _tc -%}{{- '\n' -}}{%- endif -%}
+    {%- endif -%}
+    {#- Render tool calls in Hermes format (BUG-006 fix: arguments as-is or tojson).
+        Each value output via its own {{ }} block — never concatenated with plain strings
+        in Python, which would trigger Markup HTML-escaping (BUG-003/markup fix). -#}
+    {%- if _tc -%}
+      {%- for tc in _tc -%}
+        {{- '<tool_call>\n' -}}
+        {{- '{"name": ' -}}{{- tc.function.name | tojson -}}
+        {%- if tc.function.arguments is string -%}
+          {{- ', "arguments": ' + tc.function.arguments -}}
+        {%- else -%}
+          {{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
+        {%- endif -%}
+        {{- '}' -}}
+        {%- if not loop.last -%}
+          {{- '\n</tool_call>\n' -}}
+        {%- else -%}
+          {{- '\n</tool_call>' -}}
+        {%- endif -%}
+      {%- endfor -%}
+    {%- endif -%}
+    {{- '<|im_end|>\n' -}}
+  {#- 7d: Tool results — group consecutive tool messages into one user turn -#}
+  {%- elif message.role == 'tool' -%}
+    {%- set _prev_role = messages[loop.index0 - 1].role if loop.index0 > 0 else '' -%}
+    {%- set _next_role = messages[loop.index0 + 1].role if not loop.last else '' -%}
+    {%- if _prev_role != 'tool' -%}
+      {{- '<|im_start|>user\n' -}}
+    {%- endif -%}
+    {{- '<tool_response>\n' -}}
+    {{- message.content | default('') -}}
+    {{- '\n</tool_response>' -}}
+    {%- if _next_role != 'tool' -%}
+      {{- '<|im_end|>\n' -}}
+    {%- endif -%}
+  {#- 7e: Unknown role -#}
+  {%- else -%}
+    {{- raise_exception('Unexpected message role: ' + message.role) -}}
+  {%- endif -%}
+{%- endfor -%}
+{#- ===== SECTION 8: GENERATION PROMPT =====
+     enable_thinking=True  → no prefill (model generates <think> itself)
+     enable_thinking=False → exact 19-char non-thinking prefill (BUG-005 fix)
+-#}
+{%- if add_generation_prompt -%}
+  {{- '<|im_start|>assistant\n' -}}
+  {%- if not ns.enable_thinking -%}
+    {{- '<think>\n\n</think>\n\n' -}}
+  {%- endif -%}
+{%- endif -%}

v1.0_writeup.md ADDED Viewed

	@@ -0,0 +1,646 @@

+# Qwen3.5 / Qwen3.6 Jinja2 Chat Template — Implementation Writeup
+**File:** `qwen3_5-template.jinja`
+**Validation:** `validate_template.py` (17 fixtures, 0 failures)
+**Bugs fixed:** BUG-001 through BUG-006
+---
+## Table of Contents
+1. [Why a New Template?](#1-why-a-new-template)
+2. [Research Basis](#2-research-basis)
+3. [Model Format Fundamentals](#3-model-format-fundamentals)
+4. [Implementation Premises](#4-implementation-premises)
+5. [enable_thinking Behavior](#5-enable_thinking-behavior)
+6. [Tool Call Rendering](#6-tool-call-rendering)
+7. [Bug Analysis and Fixes](#7-bug-analysis-and-fixes)
+8. [Template Architecture](#8-template-architecture)
+9. [Test Coverage](#9-test-coverage)
+10. [Tool Ecosystem Compatibility](#10-tool-ecosystem-compatibility)
+---
+## 1. Why a New Template?
+The official Qwen3.5/3.6 chat template (as shipped with the HuggingFace model
+checkpoints) contains at least six correctness bugs that cause silent failures in
+production agent loops. These bugs were independently reported across GitHub
+issues, HuggingFace discussions, Reddit threads, and llama.cpp/vLLM bug trackers
+between early 2025 and mid-2026.
+An analysis of approximately five widely-used community replacement templates
+showed that each one fixed a different subset of the bugs while introducing new
+ones. None were derived systematically from the model's training format as
+documented in the official technical report.
+This template was written from scratch, grounded in:
+- **Qwen3 Technical Report** (arXiv:2505.09388) — authoritative description of
+  the training format, thinking mechanism, and tool-calling protocol.
+- **Mid-Think Paper** (arXiv:2601.07036) — phase structure of reasoning chains and
+  budget-stop format.
+- **Hermes tool-call format spec** (Nous Research / NousHermes) — the XML-based
+  tool-call format on which Qwen3 tool-calling is modelled.
+- Community bug reports and vLLM/llama.cpp/Ollama source code analysis.
+---
+## 2. Research Basis
+### 2.1 Qwen3 Technical Report (arXiv:2505.09388)
+Key facts extracted for template construction:
+- No BOS token. The model was trained without one; inserting one degrades output.
+- `<think>` and `</think>` are **regular BPE text tokens**, not special tokens.
+  Tokenizer ID 151644 = `<|im_start|>`, 151645 = `<|im_end|>`.
+- Non-thinking mode is implemented by prepending an **empty think block** to the
+  assistant generation: `<think>\n\n</think>\n\n`. The report states explicitly:
+  *"For non-thinking mode samples, we retain an empty thinking block in the
+  assistant's response. This design ensures internal format consistency."*
+- `/think` and `/no_think` are plain text suffixes in user messages, not special
+  tokens. The model was fine-tuned to follow the **last** such flag encountered in
+  a multi-turn conversation.
+### 2.2 Vocab and Tokenizer Notes
+```
+Token            ID       Note
+<|endoftext|>   151643   End-of-document / pad fallback
+<|im_start|>    151644   Begin-of-turn
+<|im_end|>      151645   End-of-turn, eos_token
+```
+Qwen3.5/3.6 both use a padded vocabulary of 248,320 entries; tokens above 151,646
+are padding with no semantics. The tokenizer class is `Qwen2Tokenizer` (BBPE,
+no `<unk>`).
+### 2.3 Tool-Call Format Origin
+Qwen3 tool-calling uses the **Hermes-2 XML format** (NousResearch):
+```
+<tool_call>
+{"name": "function_name", "arguments": {"key": "value"}}
+</tool_call>
+```
+This is identical to vLLM's `hermes` parser target and is the format recognised
+by Ollama's `parseTag()` heuristic (first text node following `.ToolCalls`).
+---
+## 3. Model Format Fundamentals
+### 3.1 ChatML Base Structure
+Every conversation is encoded as a sequence of turns delimited by im-start/end
+control tokens. No newline appears before `<|im_end|>`.
+```
+<|im_start|>system
+{system_content}<|im_end|>
+<|im_start|>user
+{user_content}<|im_end|>
+<|im_start|>assistant
+<think>
+{thinking}
+</think>
+{response}<|im_end|>
+```
+The blank line between `</think>` and the response is mandatory. The model was
+trained on this exact whitespace layout.
+### 3.2 Non-Thinking Prefill (Character-Exact)
+The non-thinking generation prefix is exactly 19 characters:
+```
+<think>\n\n</think>\n\n
+```
+Decomposed: `<think>` (7) + `\n` (1) + `\n` (1) + `</think>` (8) + `\n` (1) +
+`\n` (1) = 19. Any deviation (extra space, missing newline) moves the model off
+its training distribution.
+### 3.3 Think-Block Scope Rules
+| Turn type | Think-block treatment |
+|---|---|
+| Historical assistant turn (non-last, no tool_calls) | **Strip entirely** — `split('</think>')[-1].lstrip('\n')` |
+| Historical assistant turn (has tool_calls) | **Preserve** — think block is part of the tool-call format |
+| Last assistant turn in history (`add_generation_prompt=False`) | **Preserve verbatim** |
+| Last assistant turn, no existing think, `enable_thinking=False` | **Inject** `<think>\n\n</think>\n\n` prefix |
+| Generation prompt, `enable_thinking=True` | **No prefix** — model generates its own `<think>` |
+| Generation prompt, `enable_thinking=False` | **Inject** `<think>\n\n</think>\n\n` prefix |
+---
+## 4. Implementation Premises
+### 4.1 Single Namespace Object
+All mutable template state lives in one `ns` namespace object, avoiding
+Jinja2's scoping trap (variables set inside `{% for %}` blocks are not visible
+outside without a namespace):
+```jinja2
+{%- set ns = namespace(
+    enable_thinking=true,
+    image_count=0,
+    video_count=0
+) -%}
+```
+### 4.2 Pre-Scan Before Rendering
+The template performs a full pre-scan of all messages before emitting any output.
+This is necessary because `/no_think` or `/think` can appear in any user message,
+and the final flag determines the generation prompt behaviour. A single-pass loop
+that both renders and tracks flags would have to look ahead, which Jinja2 cannot
+do.
+```jinja2
+{%- for i in range(messages | length) -%}
+  {%- if messages[i].role == 'user' -%}
+    {%- set _u = messages[i].content if messages[i].content is string else '' -%}
+    {%- if _u.rstrip().endswith('/no_think') -%}
+      {%- set ns.enable_thinking = false -%}
+    {%- elif _u.rstrip().endswith('/think') -%}
+      {%- set ns.enable_thinking = true -%}
+    {%- endif -%}
+  {%- endif -%}
+{%- endfor -%}
+```
+### 4.3 Separate `{{ }}` Blocks for `tojson` Output
+Jinja2's `tojson` filter returns a `Markup` object (already HTML-safe). When a
+`Markup` value is Python-concatenated with a plain string using `+`, Jinja2
+auto-escapes the plain string and produces double-encoded output (`&quot;`,
+`&#34;`, etc.). This is BUG-003.
+The fix is to never concatenate `tojson` output with plain strings inside a
+Jinja2 expression. Each fragment is emitted through its own `{{ }}` block:
+```jinja2
+{# WRONG — triggers HTML-escaping of the plain string #}
+{{- '{"name": ' + tc.function.name | tojson + '}' -}}
+{# CORRECT — separate blocks, no Python concatenation #}
+{{- '{"name": ' -}}{{- tc.function.name | tojson -}}{{- '}' -}}
+```
+### 4.4 System Message Collection Phase
+Multiple system messages are merged into a single `<|im_start|>system` turn
+with `\n\n` as separator (BUG-004 fix). This is done as a separate pre-pass
+(Section 4 in the template), so the main loop can unconditionally skip all
+`role == 'system'` messages.
+The user's system content always appears **before** the tools block in the
+system turn, matching the training format.
+### 4.5 Tool Normalisation
+Some frameworks pass tool definitions with a top-level `function` key
+(`{"type": "function", "function": {...}}`), while others pass the function
+schema directly (`{"name": ..., "parameters": ...}`). The template normalises
+all entries to the canonical form before serialisation:
+```jinja2
+{%- if tool.function is defined -%}
+  {%- set ns_tb.list = ns_tb.list + [tool] -%}
+{%- else -%}
+  {%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
+{%- endif -%}
+```
+---
+## 5. `enable_thinking` Behavior
+### 5.1 Resolution Priority (Highest to Lowest)
+1. **`/no_think` or `/think` text suffix** in the last user message that contains
+   one. This is the highest priority because it represents the most recent
+   explicit user intent and mirrors the model's fine-tuning data.
+2. **`enable_thinking` template variable** passed at render time (e.g., via
+   `tokenizer.apply_chat_template(..., enable_thinking=False)`).
+3. **Default value** of `true` (thinking on by default, consistent with the model's
+   training distribution).
+### 5.2 Generation Prompt Behaviour
+When `add_generation_prompt=True`:
+```
+enable_thinking=True  →  <|im_start|>assistant\n
+                         (model generates <think> itself)
+enable_thinking=False →  <|im_start|>assistant\n<think>\n\n</think>\n\n
+                         (forces non-thinking mode by pre-filling empty block)
+```
+### 5.3 Last-History-Turn Behaviour (add_generation_prompt=False)
+When the conversation ends with an assistant message and no generation prompt
+is requested — typical when scoring a complete conversation or when the
+assistant message is being appended to the prompt for continuation:
+- **Think block present:** preserved verbatim regardless of `enable_thinking`.
+- **No think block, `enable_thinking=True`:** content left as-is (historical turns
+  are already stripped; the last one is the current generation context).
+- **No think block, `enable_thinking=False`:** inject `<think>\n\n</think>\n\n`
+  before the content.
+### 5.4 Historical Think-Block Stripping (BUG-001)
+The official template collapses think blocks in historical turns to
+`<think>\n\n</think>` instead of removing them. In a long agentic loop this
+produces an ever-growing sequence of empty think blocks that degrades prompt
+quality ("prompt poisoning").
+The correct operation is full removal:
+```python
+# Python equivalent
+content = content.split('</think>')[-1].lstrip('\n') if '</think>' in content else content
+```
+```jinja2
+{# Jinja2 equivalent #}
+{%- if '</think>' in _ac -%}
+  {%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
+{%- endif -%}
+```
+**Exception:** turns that also carry `tool_calls` keep their think block intact.
+The model is trained to produce thinking before tool invocations, and stripping
+the think block from a historical tool-call turn would misrepresent the prompt.
+---
+## 6. Tool Call Rendering
+### 6.1 System Turn Tool Block Format
+The exact text injected into the system message when tools are present matches
+the Qwen3 Hermes training format:
+```
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{"type": "function", "function": {"name": "...", ...}}
+</tools>
+For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
+<tool_call>
+{"name": <function-name>, "arguments": <args-json-object>}
+</tool_call>
+```
+All text — including the instruction sentences — is literal and must not be
+modified. The model was trained on this exact phrasing.
+### 6.2 Assistant Tool-Call Block
+Each tool call is rendered as:
+```
+<tool_call>
+{"name": "function_name", "arguments": {JSON_OBJECT}}
+</tool_call>
+```
+Multiple parallel calls appear as consecutive blocks separated by `\n`:
+```
+<tool_call>
+{"name": "f1", "arguments": {...}}
+</tool_call>
+<tool_call>
+{"name": "f2", "arguments": {...}}
+</tool_call><|im_end|>
+```
+Note: the final `</tool_call>` is immediately followed by `<|im_end|>` with no
+intervening newline. This matches the training format.
+### 6.3 Arguments: String vs Object (BUG-006)
+Some frameworks (notably older OpenAI-compatible clients and some streaming
+implementations) serialise tool-call arguments as a JSON string
+(`"{\"location\": \"Berlin\"}"`) rather than as an object
+(`{"location": "Berlin"}`). The template handles both:
+```jinja2
+{%- if tc.function.arguments is string -%}
+  {{- ', "arguments": ' + tc.function.arguments -}}
+{%- else -%}
+  {{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
+{%- endif -%}
+```
+When arguments are already a string they are passed through as-is (the caller
+is responsible for valid JSON). When they are a dict/object, `tojson` serialises
+them correctly including Unicode escaping and quote escaping.
+This arrangement also prevents the `"""` crash (BUG-006): Python triple-quoted
+strings inside Jinja2 template strings would crash the Jinja2 parser if the
+arguments dict happened to contain a value like `"""`. By using `tojson`
+(which produces a proper JSON string literal) the crash cannot occur.
+### 6.4 Tool Results
+Tool results are wrapped in a user turn using `<tool_response>`:
+```
+<|im_start|>user
+<tool_response>
+{result_content}
+</tool_response><|im_end|>
+```
+Consecutive tool-response messages are merged into a single user turn — the
+template checks whether the previous message's role was also `tool` and
+suppresses the `<|im_start|>user\n` header if so.
+---
+## 7. Bug Analysis and Fixes
+### BUG-001 — Historical Think Blocks Leaked (CRITICAL)
+**Symptom:** In multi-turn conversations with `enable_thinking=True`, every
+historical assistant message retains a collapsed `<think>\n\n</think>` block.
+Over many turns the prompt accumulates dozens of empty think blocks, degrading
+model performance.
+**Root cause:** Official template strips think content but leaves the surrounding
+`<think>` tags.
+**Fix:** Strip the entire block by splitting on `</think>` and taking the tail:
+```jinja2
+{%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
+```
+**Tests:** T10, T13, T16
+---
+### BUG-002 — KeyError on content=None / Missing content Key (HIGH)
+**Symptom:** When an assistant message contains only `tool_calls` and no `content`
+(or `content=None`, which is the OpenAI convention for pure tool-call responses),
+the template throws `UndefinedError` or `KeyError`.
+**Root cause:** Official template accesses `message.content` directly.
+**Fix:** Guard the access:
+```jinja2
+{%- if message.content is defined and message.content is string -%}
+  {%- set _ac = message.content -%}
+{%- elif message.content is defined and message.content is iterable ... -%}
+  {%- set _ac = render_content(message.content) -%}
+{%- else -%}
+  {%- set _ac = '' -%}
+{%- endif -%}
+```
+**Tests:** T04, T11
+---
+### BUG-003 — Markup HTML-Escaping in Tool JSON (MEDIUM)
+**Symptom:** Tool definitions or tool-call arguments with characters like `<`, `>`,
+`&`, or `"` appear HTML-escaped in the rendered prompt (`&lt;`, `&gt;`, `&amp;`,
+`&#34;`). This causes the model to misread the tool schema.
+**Root cause:** `tojson` returns a Jinja2 `Markup` object. When `Markup` is
+concatenated with a plain Python string using `+` inside a Jinja2 expression,
+the plain string is auto-escaped and then concatenated with the already-safe
+`Markup` value.
+**Fix:** Never use `+` to join `tojson` output with plain strings. Emit each
+fragment through a separate `{{ }}` block:
+```jinja2
+{# Every fragment in its own block #}
+{{- '{"name": ' -}}{{- tc.function.name | tojson -}}
+```
+**Tests:** T03, T04, T12
+---
+### BUG-004 — Multiple System Messages Not Handled (MEDIUM)
+**Symptom:** Frameworks such as Open WebUI send more than one `role: system`
+message. The official template either crashes or emits multiple system turns,
+both of which confuse the model.
+**Root cause:** No merging logic for multiple system messages.
+**Fix:** Pre-scan all messages and concatenate system content with `\n\n`:
+```jinja2
+{%- if ns_sys.content == '' -%}
+  {%- set ns_sys.content = _c -%}
+{%- else -%}
+  {%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
+{%- endif -%}
+```
+**Tests:** T02, T14
+---
+### BUG-005 — Wrong Non-Thinking Prefill Whitespace (LOW-MEDIUM)
+**Symptom:** Non-thinking mode produces a think block with incorrect whitespace,
+moving the model off its training distribution and causing output quality
+degradation or refusal to honour the non-thinking instruction.
+**Root cause:** The official template uses `<think>\n</think>\n\n` (missing the
+second newline inside the block), which does not match the format described in
+the technical report.
+**Fix:** Use the exact 19-character sequence:
+```
+<think>\n\n</think>\n\n
+```
+**Tests:** T08, T17
+---
+### BUG-006 — Triple-Quote Crash on Python String Arguments (MEDIUM)
+**Symptom:** Jinja2 raises a `TemplateSyntaxError` or produces garbled output when
+tool-call arguments contain triple-quote sequences (`"""` or `'''`) because the
+template previously embedded argument values using Python string literal
+concatenation.
+**Root cause:** Some community templates build the tool-call JSON via string
+interpolation (`'{"arguments": "' + args + '"}'`), which breaks for argument
+values containing quote characters.
+**Fix:** Use `tojson` for all non-string arguments (produces well-formed JSON) and
+pass string arguments through unchanged (caller provides valid JSON strings):
+```jinja2
+{%- if tc.function.arguments is string -%}
+  {{- ', "arguments": ' + tc.function.arguments -}}
+{%- else -%}
+  {{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
+{%- endif -%}
+```
+**Tests:** T12
+---
+## 8. Template Architecture
+The template is divided into eight clearly delimited sections, each with a
+comment header:
+```
+Section 1  render_content macro
+           Handles str / list (image/video/text) / None → plain text.
+           Increments ns.image_count / ns.video_count for vision tokens.
+Section 2  Namespace initialisation
+           Single ns object; enable_thinking defaults to true.
+Section 3  Pre-scan
+           Walk all user messages; last /no_think or /think wins.
+Section 4  Collect system content
+           Merge all system / developer messages with \n\n.
+Section 5  Build tools list
+           Normalise every tool to {"type":"function","function":{...}}.
+Section 6  Output system turn
+           Emit one <|im_start|>system turn (user content + tools block).
+Section 7  Main message loop
+           7a  system/developer  → skip (already emitted)
+           7b  user              → render with vision support
+           7c  assistant         → render with think-block logic + tool_calls
+           7d  tool              → group into user turns
+           7e  unknown role      → raise_exception
+Section 8  Generation prompt
+           enable_thinking=True  → bare <|im_start|>assistant\n
+           enable_thinking=False → add <think>\n\n</think>\n\n prefix
+```
+### Design Decisions
+**No default system prompt.** Unlike some community templates, this template does
+not inject a default system prompt when none is provided. The model performs well
+without one, and injecting one would cause conflicts for applications that rely on
+the system prompt being exactly what they set.
+**No BOS token.** The Qwen3 family was trained without a BOS token. Adding one
+would consume a context window slot unnecessarily and may harm performance.
+**No `<|endoftext|>` in conversation.** This token is reserved for
+end-of-document signalling in the pre-training phase, not for conversation
+boundaries.
+---
+## 9. Test Coverage
+The 17 test fixtures in `validate_template.py` cover:
+| ID | Scenario | Key assertion |
+|---|---|---|
+| T01 | Simple user/assistant, no system, no tools | Exact ChatML output |
+| T02 | System message | System turn before user turn |
+| T03 | Tools defined, `enable_thinking=True` | Tools block in system; no prefill |
+| T04 | Tool call, `content=None` | No crash; `<tool_call>` present |
+| T05 | Parallel tool calls | `</tool_call>\n<tool_call>` separator |
+| T06 | Tool result (role=tool) | `<|im_start|>user\n<tool_response>` |
+| T07 | `enable_thinking=True` generation prompt | No think prefix emitted |
+| T08 | `enable_thinking=False` generation prompt | Exact 19-char prefill |
+| T09 | `/no_think` flag in user message | Non-thinking prefill applied |
+| T10 | Historical think blocks | Fully stripped, not collapsed |
+| T11 | Missing `content` key on assistant | No KeyError / UndefinedError |
+| T12 | Special chars in arguments | Correctly JSON-escaped |
+| T13 | Historical tool-call turn with think | Think block preserved |
+| T14 | Multiple system messages | Merged with `\n\n`; single system turn |
+| T15 | Parallel tool responses | Both inside single user turn |
+| T16 | Last history turn with existing think | Preserved verbatim |
+| T17 | Last history turn, no think, `enable_thinking=False` | Prefill injected |
+Run the suite:
+```bash
+cd /workspace/project/qwen3_5-template
+python validate_template.py
+# Expected: 17 passed, 0 failed
+```
+---
+## 10. Tool Ecosystem Compatibility
+An analysis of 51 tool-calling frameworks and inference backends was conducted to
+verify that the template's output is consumable by the broadest possible set of
+tools. Key findings:
+### 10.1 OpenAI JSON Format Dominance
+31 of the 51 analysed tools use the **OpenAI-compatible JSON function-call API**
+(Group A). These tools pass tool definitions as a `tools` array and receive tool
+calls back as `message.tool_calls` objects. The template's input format is fully
+compatible with this convention.
+Notable Group A members: OpenHands, LangChain, LangGraph, LiteLLM, CrewAI,
+Pydantic AI, Open WebUI, LibreChat, LM Studio, LlamaIndex, AutoGen, LiteLLM.
+### 10.2 Inference Server Compatibility
+| Backend | Compatibility note |
+|---|---|
+| **vLLM** | Uses the `hermes` tool parser for Qwen models, matching this template's `<tool_call>` format exactly. |
+| **llama.cpp** | Recognises `<tool_call>` via the `--jinja` flag + chat template loading. Note: `--jinja` disables GBNF grammar (Issue #12204). |
+| **Ollama** | Auto-detects the tool-call tag via `parseTag()` which reads the first text node after `.ToolCalls` in the Go template tree — `<tool_call>` is one of the three known tags. |
+| **LM Studio** | Passes tool definitions as the `tools` API field; receives tool calls in `message.tool_calls`. |
+| **TabbyAPI** | Full OpenAI-compatible API; correct chat template is the only requirement. |
+### 10.3 Non-Native Tool-Calling Frameworks
+Three framework groups (Cline/Roo Code XML, OpenCode `<parameter>`, Aider
+SEARCH/REPLACE) do not use the OpenAI tool-calling API at all. They inject their
+own tool descriptions into the system prompt and parse the model's text output
+directly. These frameworks do not interact with the chat template's tool-calling
+sections — they send no `tools` array and the template therefore emits no tool
+block.
+### 10.4 Arguments as JSON String
+Several frameworks (notably some streaming clients and older OpenAI SDK versions)
+serialise `tool_calls[].function.arguments` as a JSON string rather than a parsed
+object. The template's dual-path arguments handling (Section 6.3) accommodates
+both cases transparently.
+---
+*Generated as part of the `fix/qwen3-template-bugs` implementation.*