Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "StableQuant/Qwen-Templates-Rebuild-Project" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
- SGLang
How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
| # Qwen3.5 / Qwen3.6 Jinja2 Chat Template β Implementation Writeup | |
| **File:** `qwen3_5-template.jinja` | |
| **Validation:** `validate_template.py` (17 fixtures, 0 failures) | |
| **Bugs fixed:** BUG-001 through BUG-006 | |
| --- | |
| ## Table of Contents | |
| 1. [Why a New Template?](#1-why-a-new-template) | |
| 2. [Research Basis](#2-research-basis) | |
| 3. [Model Format Fundamentals](#3-model-format-fundamentals) | |
| 4. [Implementation Premises](#4-implementation-premises) | |
| 5. [enable_thinking Behavior](#5-enable_thinking-behavior) | |
| 6. [Tool Call Rendering](#6-tool-call-rendering) | |
| 7. [Bug Analysis and Fixes](#7-bug-analysis-and-fixes) | |
| 8. [Template Architecture](#8-template-architecture) | |
| 9. [Test Coverage](#9-test-coverage) | |
| 10. [Tool Ecosystem Compatibility](#10-tool-ecosystem-compatibility) | |
| --- | |
| ## 1. Why a New Template? | |
| The official Qwen3.5/3.6 chat template (as shipped with the HuggingFace model | |
| checkpoints) contains at least six correctness bugs that cause silent failures in | |
| production agent loops. These bugs were independently reported across GitHub | |
| issues, HuggingFace discussions, Reddit threads, and llama.cpp/vLLM bug trackers | |
| between early 2025 and mid-2026. | |
| An analysis of approximately five widely-used community replacement templates | |
| showed that each one fixed a different subset of the bugs while introducing new | |
| ones. None were derived systematically from the model's training format as | |
| documented in the official technical report. | |
| This template was written from scratch, grounded in: | |
| - **Qwen3 Technical Report** (arXiv:2505.09388) β authoritative description of | |
| the training format, thinking mechanism, and tool-calling protocol. | |
| - **Mid-Think Paper** (arXiv:2601.07036) β phase structure of reasoning chains and | |
| budget-stop format. | |
| - **Hermes tool-call format spec** (Nous Research / NousHermes) β the XML-based | |
| tool-call format on which Qwen3 tool-calling is modelled. | |
| - Community bug reports and vLLM/llama.cpp/Ollama source code analysis. | |
| --- | |
| ## 2. Research Basis | |
| ### 2.1 Qwen3 Technical Report (arXiv:2505.09388) | |
| Key facts extracted for template construction: | |
| - No BOS token. The model was trained without one; inserting one degrades output. | |
| - `<think>` and `</think>` are **regular BPE text tokens**, not special tokens. | |
| Tokenizer ID 151644 = `<|im_start|>`, 151645 = `<|im_end|>`. | |
| - Non-thinking mode is implemented by prepending an **empty think block** to the | |
| assistant generation: `<think>\n\n</think>\n\n`. The report states explicitly: | |
| *"For non-thinking mode samples, we retain an empty thinking block in the | |
| assistant's response. This design ensures internal format consistency."* | |
| - `/think` and `/no_think` are plain text suffixes in user messages, not special | |
| tokens. The model was fine-tuned to follow the **last** such flag encountered in | |
| a multi-turn conversation. | |
| ### 2.2 Vocab and Tokenizer Notes | |
| ``` | |
| Token ID Note | |
| <|endoftext|> 151643 End-of-document / pad fallback | |
| <|im_start|> 151644 Begin-of-turn | |
| <|im_end|> 151645 End-of-turn, eos_token | |
| ``` | |
| Qwen3.5/3.6 both use a padded vocabulary of 248,320 entries; tokens above 151,646 | |
| are padding with no semantics. The tokenizer class is `Qwen2Tokenizer` (BBPE, | |
| no `<unk>`). | |
| ### 2.3 Tool-Call Format Origin | |
| Qwen3 tool-calling uses the **Hermes-2 XML format** (NousResearch): | |
| ``` | |
| <tool_call> | |
| {"name": "function_name", "arguments": {"key": "value"}} | |
| </tool_call> | |
| ``` | |
| This is identical to vLLM's `hermes` parser target and is the format recognised | |
| by Ollama's `parseTag()` heuristic (first text node following `.ToolCalls`). | |
| --- | |
| ## 3. Model Format Fundamentals | |
| ### 3.1 ChatML Base Structure | |
| Every conversation is encoded as a sequence of turns delimited by im-start/end | |
| control tokens. No newline appears before `<|im_end|>`. | |
| ``` | |
| <|im_start|>system | |
| {system_content}<|im_end|> | |
| <|im_start|>user | |
| {user_content}<|im_end|> | |
| <|im_start|>assistant | |
| <think> | |
| {thinking} | |
| </think> | |
| {response}<|im_end|> | |
| ``` | |
| The blank line between `</think>` and the response is mandatory. The model was | |
| trained on this exact whitespace layout. | |
| ### 3.2 Non-Thinking Prefill (Character-Exact) | |
| The non-thinking generation prefix is exactly 19 characters: | |
| ``` | |
| <think>\n\n</think>\n\n | |
| ``` | |
| Decomposed: `<think>` (7) + `\n` (1) + `\n` (1) + `</think>` (8) + `\n` (1) + | |
| `\n` (1) = 19. Any deviation (extra space, missing newline) moves the model off | |
| its training distribution. | |
| ### 3.3 Think-Block Scope Rules | |
| | Turn type | Think-block treatment | | |
| |---|---| | |
| | Historical assistant turn (non-last, no tool_calls) | **Strip entirely** β `split('</think>')[-1].lstrip('\n')` | | |
| | Historical assistant turn (has tool_calls) | **Preserve** β think block is part of the tool-call format | | |
| | Last assistant turn in history (`add_generation_prompt=False`) | **Preserve verbatim** | | |
| | Last assistant turn, no existing think, `enable_thinking=False` | **Inject** `<think>\n\n</think>\n\n` prefix | | |
| | Generation prompt, `enable_thinking=True` | **No prefix** β model generates its own `<think>` | | |
| | Generation prompt, `enable_thinking=False` | **Inject** `<think>\n\n</think>\n\n` prefix | | |
| --- | |
| ## 4. Implementation Premises | |
| ### 4.1 Single Namespace Object | |
| All mutable template state lives in one `ns` namespace object, avoiding | |
| Jinja2's scoping trap (variables set inside `{% for %}` blocks are not visible | |
| outside without a namespace): | |
| ```jinja2 | |
| {%- set ns = namespace( | |
| enable_thinking=true, | |
| image_count=0, | |
| video_count=0 | |
| ) -%} | |
| ``` | |
| ### 4.2 Pre-Scan Before Rendering | |
| The template performs a full pre-scan of all messages before emitting any output. | |
| This is necessary because `/no_think` or `/think` can appear in any user message, | |
| and the final flag determines the generation prompt behaviour. A single-pass loop | |
| that both renders and tracks flags would have to look ahead, which Jinja2 cannot | |
| do. | |
| ```jinja2 | |
| {%- for i in range(messages | length) -%} | |
| {%- if messages[i].role == 'user' -%} | |
| {%- set _u = messages[i].content if messages[i].content is string else '' -%} | |
| {%- if _u.rstrip().endswith('/no_think') -%} | |
| {%- set ns.enable_thinking = false -%} | |
| {%- elif _u.rstrip().endswith('/think') -%} | |
| {%- set ns.enable_thinking = true -%} | |
| {%- endif -%} | |
| {%- endif -%} | |
| {%- endfor -%} | |
| ``` | |
| ### 4.3 Separate `{{ }}` Blocks for `tojson` Output | |
| Jinja2's `tojson` filter returns a `Markup` object (already HTML-safe). When a | |
| `Markup` value is Python-concatenated with a plain string using `+`, Jinja2 | |
| auto-escapes the plain string and produces double-encoded output (`"`, | |
| `"`, etc.). This is BUG-003. | |
| The fix is to never concatenate `tojson` output with plain strings inside a | |
| Jinja2 expression. Each fragment is emitted through its own `{{ }}` block: | |
| ```jinja2 | |
| {# WRONG β triggers HTML-escaping of the plain string #} | |
| {{- '{"name": ' + tc.function.name | tojson + '}' -}} | |
| {# CORRECT β separate blocks, no Python concatenation #} | |
| {{- '{"name": ' -}}{{- tc.function.name | tojson -}}{{- '}' -}} | |
| ``` | |
| ### 4.4 System Message Collection Phase | |
| Multiple system messages are merged into a single `<|im_start|>system` turn | |
| with `\n\n` as separator (BUG-004 fix). This is done as a separate pre-pass | |
| (Section 4 in the template), so the main loop can unconditionally skip all | |
| `role == 'system'` messages. | |
| The user's system content always appears **before** the tools block in the | |
| system turn, matching the training format. | |
| ### 4.5 Tool Normalisation | |
| Some frameworks pass tool definitions with a top-level `function` key | |
| (`{"type": "function", "function": {...}}`), while others pass the function | |
| schema directly (`{"name": ..., "parameters": ...}`). The template normalises | |
| all entries to the canonical form before serialisation: | |
| ```jinja2 | |
| {%- if tool.function is defined -%} | |
| {%- set ns_tb.list = ns_tb.list + [tool] -%} | |
| {%- else -%} | |
| {%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%} | |
| {%- endif -%} | |
| ``` | |
| --- | |
| ## 5. `enable_thinking` Behavior | |
| ### 5.1 Resolution Priority (Highest to Lowest) | |
| 1. **`/no_think` or `/think` text suffix** in the last user message that contains | |
| one. This is the highest priority because it represents the most recent | |
| explicit user intent and mirrors the model's fine-tuning data. | |
| 2. **`enable_thinking` template variable** passed at render time (e.g., via | |
| `tokenizer.apply_chat_template(..., enable_thinking=False)`). | |
| 3. **Default value** of `true` (thinking on by default, consistent with the model's | |
| training distribution). | |
| ### 5.2 Generation Prompt Behaviour | |
| When `add_generation_prompt=True`: | |
| ``` | |
| enable_thinking=True β <|im_start|>assistant\n | |
| (model generates <think> itself) | |
| enable_thinking=False β <|im_start|>assistant\n<think>\n\n</think>\n\n | |
| (forces non-thinking mode by pre-filling empty block) | |
| ``` | |
| ### 5.3 Last-History-Turn Behaviour (add_generation_prompt=False) | |
| When the conversation ends with an assistant message and no generation prompt | |
| is requested β typical when scoring a complete conversation or when the | |
| assistant message is being appended to the prompt for continuation: | |
| - **Think block present:** preserved verbatim regardless of `enable_thinking`. | |
| - **No think block, `enable_thinking=True`:** content left as-is (historical turns | |
| are already stripped; the last one is the current generation context). | |
| - **No think block, `enable_thinking=False`:** inject `<think>\n\n</think>\n\n` | |
| before the content. | |
| ### 5.4 Historical Think-Block Stripping (BUG-001) | |
| The official template collapses think blocks in historical turns to | |
| `<think>\n\n</think>` instead of removing them. In a long agentic loop this | |
| produces an ever-growing sequence of empty think blocks that degrades prompt | |
| quality ("prompt poisoning"). | |
| The correct operation is full removal: | |
| ```python | |
| # Python equivalent | |
| content = content.split('</think>')[-1].lstrip('\n') if '</think>' in content else content | |
| ``` | |
| ```jinja2 | |
| {# Jinja2 equivalent #} | |
| {%- if '</think>' in _ac -%} | |
| {%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%} | |
| {%- endif -%} | |
| ``` | |
| **Exception:** turns that also carry `tool_calls` keep their think block intact. | |
| The model is trained to produce thinking before tool invocations, and stripping | |
| the think block from a historical tool-call turn would misrepresent the prompt. | |
| --- | |
| ## 6. Tool Call Rendering | |
| ### 6.1 System Turn Tool Block Format | |
| The exact text injected into the system message when tools are present matches | |
| the Qwen3 Hermes training format: | |
| ``` | |
| # Tools | |
| You may call one or more functions to assist with the user query. | |
| You are provided with function signatures within <tools></tools> XML tags: | |
| <tools> | |
| {"type": "function", "function": {"name": "...", ...}} | |
| </tools> | |
| For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: | |
| <tool_call> | |
| {"name": <function-name>, "arguments": <args-json-object>} | |
| </tool_call> | |
| ``` | |
| All text β including the instruction sentences β is literal and must not be | |
| modified. The model was trained on this exact phrasing. | |
| ### 6.2 Assistant Tool-Call Block | |
| Each tool call is rendered as: | |
| ``` | |
| <tool_call> | |
| {"name": "function_name", "arguments": {JSON_OBJECT}} | |
| </tool_call> | |
| ``` | |
| Multiple parallel calls appear as consecutive blocks separated by `\n`: | |
| ``` | |
| <tool_call> | |
| {"name": "f1", "arguments": {...}} | |
| </tool_call> | |
| <tool_call> | |
| {"name": "f2", "arguments": {...}} | |
| </tool_call><|im_end|> | |
| ``` | |
| Note: the final `</tool_call>` is immediately followed by `<|im_end|>` with no | |
| intervening newline. This matches the training format. | |
| ### 6.3 Arguments: String vs Object (BUG-006) | |
| Some frameworks (notably older OpenAI-compatible clients and some streaming | |
| implementations) serialise tool-call arguments as a JSON string | |
| (`"{\"location\": \"Berlin\"}"`) rather than as an object | |
| (`{"location": "Berlin"}`). The template handles both: | |
| ```jinja2 | |
| {%- if tc.function.arguments is string -%} | |
| {{- ', "arguments": ' + tc.function.arguments -}} | |
| {%- else -%} | |
| {{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}} | |
| {%- endif -%} | |
| ``` | |
| When arguments are already a string they are passed through as-is (the caller | |
| is responsible for valid JSON). When they are a dict/object, `tojson` serialises | |
| them correctly including Unicode escaping and quote escaping. | |
| This arrangement also prevents the `"""` crash (BUG-006): Python triple-quoted | |
| strings inside Jinja2 template strings would crash the Jinja2 parser if the | |
| arguments dict happened to contain a value like `"""`. By using `tojson` | |
| (which produces a proper JSON string literal) the crash cannot occur. | |
| ### 6.4 Tool Results | |
| Tool results are wrapped in a user turn using `<tool_response>`: | |
| ``` | |
| <|im_start|>user | |
| <tool_response> | |
| {result_content} | |
| </tool_response><|im_end|> | |
| ``` | |
| Consecutive tool-response messages are merged into a single user turn β the | |
| template checks whether the previous message's role was also `tool` and | |
| suppresses the `<|im_start|>user\n` header if so. | |
| --- | |
| ## 7. Bug Analysis and Fixes | |
| ### BUG-001 β Historical Think Blocks Leaked (CRITICAL) | |
| **Symptom:** In multi-turn conversations with `enable_thinking=True`, every | |
| historical assistant message retains a collapsed `<think>\n\n</think>` block. | |
| Over many turns the prompt accumulates dozens of empty think blocks, degrading | |
| model performance. | |
| **Root cause:** Official template strips think content but leaves the surrounding | |
| `<think>` tags. | |
| **Fix:** Strip the entire block by splitting on `</think>` and taking the tail: | |
| ```jinja2 | |
| {%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%} | |
| ``` | |
| **Tests:** T10, T13, T16 | |
| --- | |
| ### BUG-002 β KeyError on content=None / Missing content Key (HIGH) | |
| **Symptom:** When an assistant message contains only `tool_calls` and no `content` | |
| (or `content=None`, which is the OpenAI convention for pure tool-call responses), | |
| the template throws `UndefinedError` or `KeyError`. | |
| **Root cause:** Official template accesses `message.content` directly. | |
| **Fix:** Guard the access: | |
| ```jinja2 | |
| {%- if message.content is defined and message.content is string -%} | |
| {%- set _ac = message.content -%} | |
| {%- elif message.content is defined and message.content is iterable ... -%} | |
| {%- set _ac = render_content(message.content) -%} | |
| {%- else -%} | |
| {%- set _ac = '' -%} | |
| {%- endif -%} | |
| ``` | |
| **Tests:** T04, T11 | |
| --- | |
| ### BUG-003 β Markup HTML-Escaping in Tool JSON (MEDIUM) | |
| **Symptom:** Tool definitions or tool-call arguments with characters like `<`, `>`, | |
| `&`, or `"` appear HTML-escaped in the rendered prompt (`<`, `>`, `&`, | |
| `"`). This causes the model to misread the tool schema. | |
| **Root cause:** `tojson` returns a Jinja2 `Markup` object. When `Markup` is | |
| concatenated with a plain Python string using `+` inside a Jinja2 expression, | |
| the plain string is auto-escaped and then concatenated with the already-safe | |
| `Markup` value. | |
| **Fix:** Never use `+` to join `tojson` output with plain strings. Emit each | |
| fragment through a separate `{{ }}` block: | |
| ```jinja2 | |
| {# Every fragment in its own block #} | |
| {{- '{"name": ' -}}{{- tc.function.name | tojson -}} | |
| ``` | |
| **Tests:** T03, T04, T12 | |
| --- | |
| ### BUG-004 β Multiple System Messages Not Handled (MEDIUM) | |
| **Symptom:** Frameworks such as Open WebUI send more than one `role: system` | |
| message. The official template either crashes or emits multiple system turns, | |
| both of which confuse the model. | |
| **Root cause:** No merging logic for multiple system messages. | |
| **Fix:** Pre-scan all messages and concatenate system content with `\n\n`: | |
| ```jinja2 | |
| {%- if ns_sys.content == '' -%} | |
| {%- set ns_sys.content = _c -%} | |
| {%- else -%} | |
| {%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%} | |
| {%- endif -%} | |
| ``` | |
| **Tests:** T02, T14 | |
| --- | |
| ### BUG-005 β Wrong Non-Thinking Prefill Whitespace (LOW-MEDIUM) | |
| **Symptom:** Non-thinking mode produces a think block with incorrect whitespace, | |
| moving the model off its training distribution and causing output quality | |
| degradation or refusal to honour the non-thinking instruction. | |
| **Root cause:** The official template uses `<think>\n</think>\n\n` (missing the | |
| second newline inside the block), which does not match the format described in | |
| the technical report. | |
| **Fix:** Use the exact 19-character sequence: | |
| ``` | |
| <think>\n\n</think>\n\n | |
| ``` | |
| **Tests:** T08, T17 | |
| --- | |
| ### BUG-006 β Triple-Quote Crash on Python String Arguments (MEDIUM) | |
| **Symptom:** Jinja2 raises a `TemplateSyntaxError` or produces garbled output when | |
| tool-call arguments contain triple-quote sequences (`"""` or `'''`) because the | |
| template previously embedded argument values using Python string literal | |
| concatenation. | |
| **Root cause:** Some community templates build the tool-call JSON via string | |
| interpolation (`'{"arguments": "' + args + '"}'`), which breaks for argument | |
| values containing quote characters. | |
| **Fix:** Use `tojson` for all non-string arguments (produces well-formed JSON) and | |
| pass string arguments through unchanged (caller provides valid JSON strings): | |
| ```jinja2 | |
| {%- if tc.function.arguments is string -%} | |
| {{- ', "arguments": ' + tc.function.arguments -}} | |
| {%- else -%} | |
| {{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}} | |
| {%- endif -%} | |
| ``` | |
| **Tests:** T12 | |
| --- | |
| ## 8. Template Architecture | |
| The template is divided into eight clearly delimited sections, each with a | |
| comment header: | |
| ``` | |
| Section 1 render_content macro | |
| Handles str / list (image/video/text) / None β plain text. | |
| Increments ns.image_count / ns.video_count for vision tokens. | |
| Section 2 Namespace initialisation | |
| Single ns object; enable_thinking defaults to true. | |
| Section 3 Pre-scan | |
| Walk all user messages; last /no_think or /think wins. | |
| Section 4 Collect system content | |
| Merge all system / developer messages with \n\n. | |
| Section 5 Build tools list | |
| Normalise every tool to {"type":"function","function":{...}}. | |
| Section 6 Output system turn | |
| Emit one <|im_start|>system turn (user content + tools block). | |
| Section 7 Main message loop | |
| 7a system/developer β skip (already emitted) | |
| 7b user β render with vision support | |
| 7c assistant β render with think-block logic + tool_calls | |
| 7d tool β group into user turns | |
| 7e unknown role β raise_exception | |
| Section 8 Generation prompt | |
| enable_thinking=True β bare <|im_start|>assistant\n | |
| enable_thinking=False β add <think>\n\n</think>\n\n prefix | |
| ``` | |
| ### Design Decisions | |
| **No default system prompt.** Unlike some community templates, this template does | |
| not inject a default system prompt when none is provided. The model performs well | |
| without one, and injecting one would cause conflicts for applications that rely on | |
| the system prompt being exactly what they set. | |
| **No BOS token.** The Qwen3 family was trained without a BOS token. Adding one | |
| would consume a context window slot unnecessarily and may harm performance. | |
| **No `<|endoftext|>` in conversation.** This token is reserved for | |
| end-of-document signalling in the pre-training phase, not for conversation | |
| boundaries. | |
| --- | |
| ## 9. Test Coverage | |
| The 17 test fixtures in `validate_template.py` cover: | |
| | ID | Scenario | Key assertion | | |
| |---|---|---| | |
| | T01 | Simple user/assistant, no system, no tools | Exact ChatML output | | |
| | T02 | System message | System turn before user turn | | |
| | T03 | Tools defined, `enable_thinking=True` | Tools block in system; no prefill | | |
| | T04 | Tool call, `content=None` | No crash; `<tool_call>` present | | |
| | T05 | Parallel tool calls | `</tool_call>\n<tool_call>` separator | | |
| | T06 | Tool result (role=tool) | `<|im_start|>user\n<tool_response>` | | |
| | T07 | `enable_thinking=True` generation prompt | No think prefix emitted | | |
| | T08 | `enable_thinking=False` generation prompt | Exact 19-char prefill | | |
| | T09 | `/no_think` flag in user message | Non-thinking prefill applied | | |
| | T10 | Historical think blocks | Fully stripped, not collapsed | | |
| | T11 | Missing `content` key on assistant | No KeyError / UndefinedError | | |
| | T12 | Special chars in arguments | Correctly JSON-escaped | | |
| | T13 | Historical tool-call turn with think | Think block preserved | | |
| | T14 | Multiple system messages | Merged with `\n\n`; single system turn | | |
| | T15 | Parallel tool responses | Both inside single user turn | | |
| | T16 | Last history turn with existing think | Preserved verbatim | | |
| | T17 | Last history turn, no think, `enable_thinking=False` | Prefill injected | | |
| Run the suite: | |
| ```bash | |
| cd /workspace/project/qwen3_5-template | |
| python validate_template.py | |
| # Expected: 17 passed, 0 failed | |
| ``` | |
| --- | |
| ## 10. Tool Ecosystem Compatibility | |
| An analysis of 51 tool-calling frameworks and inference backends was conducted to | |
| verify that the template's output is consumable by the broadest possible set of | |
| tools. Key findings: | |
| ### 10.1 OpenAI JSON Format Dominance | |
| 31 of the 51 analysed tools use the **OpenAI-compatible JSON function-call API** | |
| (Group A). These tools pass tool definitions as a `tools` array and receive tool | |
| calls back as `message.tool_calls` objects. The template's input format is fully | |
| compatible with this convention. | |
| Notable Group A members: OpenHands, LangChain, LangGraph, LiteLLM, CrewAI, | |
| Pydantic AI, Open WebUI, LibreChat, LM Studio, LlamaIndex, AutoGen, LiteLLM. | |
| ### 10.2 Inference Server Compatibility | |
| | Backend | Compatibility note | | |
| |---|---| | |
| | **vLLM** | Uses the `hermes` tool parser for Qwen models, matching this template's `<tool_call>` format exactly. | | |
| | **llama.cpp** | Recognises `<tool_call>` via the `--jinja` flag + chat template loading. Note: `--jinja` disables GBNF grammar (Issue #12204). | | |
| | **Ollama** | Auto-detects the tool-call tag via `parseTag()` which reads the first text node after `.ToolCalls` in the Go template tree β `<tool_call>` is one of the three known tags. | | |
| | **LM Studio** | Passes tool definitions as the `tools` API field; receives tool calls in `message.tool_calls`. | | |
| | **TabbyAPI** | Full OpenAI-compatible API; correct chat template is the only requirement. | | |
| ### 10.3 Non-Native Tool-Calling Frameworks | |
| Three framework groups (Cline/Roo Code XML, OpenCode `<parameter>`, Aider | |
| SEARCH/REPLACE) do not use the OpenAI tool-calling API at all. They inject their | |
| own tool descriptions into the system prompt and parse the model's text output | |
| directly. These frameworks do not interact with the chat template's tool-calling | |
| sections β they send no `tools` array and the template therefore emits no tool | |
| block. | |
| ### 10.4 Arguments as JSON String | |
| Several frameworks (notably some streaming clients and older OpenAI SDK versions) | |
| serialise `tool_calls[].function.arguments` as a JSON string rather than a parsed | |
| object. The template's dual-path arguments handling (Section 6.3) accommodates | |
| both cases transparently. | |
| --- | |
| *Generated as part of the `fix/qwen3-template-bugs` implementation.* | |