Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "StableQuant/Qwen-Templates-Rebuild-Project"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project

SGLang

How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
```
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
```

Qwen-Templates-Rebuild-Project / v1.0_writeup.md

StableQuant

Upload 2 files

1f2e278 verified 7 days ago

preview code

raw

history blame

22.8 kB

	# Qwen3.5 / Qwen3.6 Jinja2 Chat Template — Implementation Writeup

	File: `qwen3_5-template.jinja`
	Validation: `validate_template.py` (17 fixtures, 0 failures)
	Bugs fixed: BUG-001 through BUG-006

	---

	## Table of Contents

	1. [Why a New Template?](#1-why-a-new-template)
	2. [Research Basis](#2-research-basis)
	3. [Model Format Fundamentals](#3-model-format-fundamentals)
	4. [Implementation Premises](#4-implementation-premises)
	5. [enable_thinking Behavior](#5-enable_thinking-behavior)
	6. [Tool Call Rendering](#6-tool-call-rendering)
	7. [Bug Analysis and Fixes](#7-bug-analysis-and-fixes)
	8. [Template Architecture](#8-template-architecture)
	9. [Test Coverage](#9-test-coverage)
	10. [Tool Ecosystem Compatibility](#10-tool-ecosystem-compatibility)

	---

	## 1. Why a New Template?

	The official Qwen3.5/3.6 chat template (as shipped with the HuggingFace model
	checkpoints) contains at least six correctness bugs that cause silent failures in
	production agent loops. These bugs were independently reported across GitHub
	issues, HuggingFace discussions, Reddit threads, and llama.cpp/vLLM bug trackers
	between early 2025 and mid-2026.

	An analysis of approximately five widely-used community replacement templates
	showed that each one fixed a different subset of the bugs while introducing new
	ones. None were derived systematically from the model's training format as
	documented in the official technical report.

	This template was written from scratch, grounded in:

	- Qwen3 Technical Report (arXiv:2505.09388) — authoritative description of
	the training format, thinking mechanism, and tool-calling protocol.
	- Mid-Think Paper (arXiv:2601.07036) — phase structure of reasoning chains and
	budget-stop format.
	- Hermes tool-call format spec (Nous Research / NousHermes) — the XML-based
	tool-call format on which Qwen3 tool-calling is modelled.
	- Community bug reports and vLLM/llama.cpp/Ollama source code analysis.

	---

	## 2. Research Basis

	### 2.1 Qwen3 Technical Report (arXiv:2505.09388)

	Key facts extracted for template construction:

	- No BOS token. The model was trained without one; inserting one degrades output.
	- `<think>` and `</think>` are regular BPE text tokens, not special tokens.
	Tokenizer ID 151644 = `<\|im_start\|>`, 151645 = `<\|im_end\|>`.
	- Non-thinking mode is implemented by prepending an empty think block to the
	assistant generation: `<think>\n\n</think>\n\n`. The report states explicitly:
	*"For non-thinking mode samples, we retain an empty thinking block in the
	assistant's response. This design ensures internal format consistency."*
	- `/think` and `/no_think` are plain text suffixes in user messages, not special
	tokens. The model was fine-tuned to follow the last such flag encountered in
	a multi-turn conversation.

	### 2.2 Vocab and Tokenizer Notes

	```
	Token ID Note
	<\|endoftext\|> 151643 End-of-document / pad fallback
	<\|im_start\|> 151644 Begin-of-turn
	<\|im_end\|> 151645 End-of-turn, eos_token
	```

	Qwen3.5/3.6 both use a padded vocabulary of 248,320 entries; tokens above 151,646
	are padding with no semantics. The tokenizer class is `Qwen2Tokenizer` (BBPE,
	no `<unk>`).

	### 2.3 Tool-Call Format Origin

	Qwen3 tool-calling uses the Hermes-2 XML format (NousResearch):

	```
	<tool_call>
	{"name": "function_name", "arguments": {"key": "value"}}
	</tool_call>
	```

	This is identical to vLLM's `hermes` parser target and is the format recognised
	by Ollama's `parseTag()` heuristic (first text node following `.ToolCalls`).

	---

	## 3. Model Format Fundamentals

	### 3.1 ChatML Base Structure

	Every conversation is encoded as a sequence of turns delimited by im-start/end
	control tokens. No newline appears before `<\|im_end\|>`.

	```
	<\|im_start\|>system
	{system_content}<\|im_end\|>
	<\|im_start\|>user
	{user_content}<\|im_end\|>
	<\|im_start\|>assistant
	<think>
	{thinking}
	</think>

	{response}<\|im_end\|>
	```

	The blank line between `</think>` and the response is mandatory. The model was
	trained on this exact whitespace layout.

	### 3.2 Non-Thinking Prefill (Character-Exact)

	The non-thinking generation prefix is exactly 19 characters:

	```
	<think>\n\n</think>\n\n
	```

	Decomposed: `<think>` (7) + `\n` (1) + `\n` (1) + `</think>` (8) + `\n` (1) +
	`\n` (1) = 19. Any deviation (extra space, missing newline) moves the model off
	its training distribution.

	### 3.3 Think-Block Scope Rules

	\| Turn type \| Think-block treatment \|
	\|---\|---\|
	\| Historical assistant turn (non-last, no tool_calls) \| Strip entirely — `split('</think>')[-1].lstrip('\n')` \|
	\| Historical assistant turn (has tool_calls) \| Preserve — think block is part of the tool-call format \|
	\| Last assistant turn in history (`add_generation_prompt=False`) \| Preserve verbatim \|
	\| Last assistant turn, no existing think, `enable_thinking=False` \| Inject `<think>\n\n</think>\n\n` prefix \|
	\| Generation prompt, `enable_thinking=True` \| No prefix — model generates its own `<think>` \|
	\| Generation prompt, `enable_thinking=False` \| Inject `<think>\n\n</think>\n\n` prefix \|

	---

	## 4. Implementation Premises

	### 4.1 Single Namespace Object

	All mutable template state lives in one `ns` namespace object, avoiding
	Jinja2's scoping trap (variables set inside `{% for %}` blocks are not visible
	outside without a namespace):

	```jinja2
	{%- set ns = namespace(
	enable_thinking=true,
	image_count=0,
	video_count=0
	) -%}
	```

	### 4.2 Pre-Scan Before Rendering

	The template performs a full pre-scan of all messages before emitting any output.
	This is necessary because `/no_think` or `/think` can appear in any user message,
	and the final flag determines the generation prompt behaviour. A single-pass loop
	that both renders and tracks flags would have to look ahead, which Jinja2 cannot
	do.

	```jinja2
	{%- for i in range(messages \| length) -%}
	{%- if messages[i].role == 'user' -%}
	{%- set _u = messages[i].content if messages[i].content is string else '' -%}
	{%- if _u.rstrip().endswith('/no_think') -%}
	{%- set ns.enable_thinking = false -%}
	{%- elif _u.rstrip().endswith('/think') -%}
	{%- set ns.enable_thinking = true -%}
	{%- endif -%}
	{%- endif -%}
	{%- endfor -%}
	```

	### 4.3 Separate `{{ }}` Blocks for `tojson` Output

	Jinja2's `tojson` filter returns a `Markup` object (already HTML-safe). When a
	`Markup` value is Python-concatenated with a plain string using `+`, Jinja2
	auto-escapes the plain string and produces double-encoded output (`"`,
	`"`, etc.). This is BUG-003.

	The fix is to never concatenate `tojson` output with plain strings inside a
	Jinja2 expression. Each fragment is emitted through its own `{{ }}` block:

	```jinja2
	{# WRONG — triggers HTML-escaping of the plain string #}
	{{- '{"name": ' + tc.function.name \| tojson + '}' -}}

	{# CORRECT — separate blocks, no Python concatenation #}
	{{- '{"name": ' -}}{{- tc.function.name \| tojson -}}{{- '}' -}}
	```

	### 4.4 System Message Collection Phase

	Multiple system messages are merged into a single `<\|im_start\|>system` turn
	with `\n\n` as separator (BUG-004 fix). This is done as a separate pre-pass
	(Section 4 in the template), so the main loop can unconditionally skip all
	`role == 'system'` messages.

	The user's system content always appears before the tools block in the
	system turn, matching the training format.

	### 4.5 Tool Normalisation

	Some frameworks pass tool definitions with a top-level `function` key
	(`{"type": "function", "function": {...}}`), while others pass the function
	schema directly (`{"name": ..., "parameters": ...}`). The template normalises
	all entries to the canonical form before serialisation:

	```jinja2
	{%- if tool.function is defined -%}
	{%- set ns_tb.list = ns_tb.list + [tool] -%}
	{%- else -%}
	{%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
	{%- endif -%}
	```

	---

	## 5. `enable_thinking` Behavior

	### 5.1 Resolution Priority (Highest to Lowest)

	1. `/no_think` or `/think` text suffix in the last user message that contains
	one. This is the highest priority because it represents the most recent
	explicit user intent and mirrors the model's fine-tuning data.
	2. `enable_thinking` template variable passed at render time (e.g., via
	`tokenizer.apply_chat_template(..., enable_thinking=False)`).
	3. Default value of `true` (thinking on by default, consistent with the model's
	training distribution).

	### 5.2 Generation Prompt Behaviour

	When `add_generation_prompt=True`:

	```
	enable_thinking=True → <\|im_start\|>assistant\n
	(model generates <think> itself)

	enable_thinking=False → <\|im_start\|>assistant\n<think>\n\n</think>\n\n
	(forces non-thinking mode by pre-filling empty block)
	```

	### 5.3 Last-History-Turn Behaviour (add_generation_prompt=False)

	When the conversation ends with an assistant message and no generation prompt
	is requested — typical when scoring a complete conversation or when the
	assistant message is being appended to the prompt for continuation:

	- Think block present: preserved verbatim regardless of `enable_thinking`.
	- No think block, `enable_thinking=True`: content left as-is (historical turns
	are already stripped; the last one is the current generation context).
	- No think block, `enable_thinking=False`: inject `<think>\n\n</think>\n\n`
	before the content.

	### 5.4 Historical Think-Block Stripping (BUG-001)

	The official template collapses think blocks in historical turns to
	`<think>\n\n</think>` instead of removing them. In a long agentic loop this
	produces an ever-growing sequence of empty think blocks that degrades prompt
	quality ("prompt poisoning").

	The correct operation is full removal:

	```python
	# Python equivalent
	content = content.split('</think>')[-1].lstrip('\n') if '</think>' in content else content
	```

	```jinja2
	{# Jinja2 equivalent #}
	{%- if '</think>' in _ac -%}
	{%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
	{%- endif -%}
	```

	Exception: turns that also carry `tool_calls` keep their think block intact.
	The model is trained to produce thinking before tool invocations, and stripping
	the think block from a historical tool-call turn would misrepresent the prompt.

	---

	## 6. Tool Call Rendering

	### 6.1 System Turn Tool Block Format

	The exact text injected into the system message when tools are present matches
	the Qwen3 Hermes training format:

	```
	# Tools

	You may call one or more functions to assist with the user query.

	You are provided with function signatures within <tools></tools> XML tags:
	<tools>
	{"type": "function", "function": {"name": "...", ...}}
	</tools>

	For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
	<tool_call>
	{"name": <function-name>, "arguments": <args-json-object>}
	</tool_call>
	```

	All text — including the instruction sentences — is literal and must not be
	modified. The model was trained on this exact phrasing.

	### 6.2 Assistant Tool-Call Block

	Each tool call is rendered as:

	```
	<tool_call>
	{"name": "function_name", "arguments": {JSON_OBJECT}}
	</tool_call>
	```

	Multiple parallel calls appear as consecutive blocks separated by `\n`:

	```
	<tool_call>
	{"name": "f1", "arguments": {...}}
	</tool_call>
	<tool_call>
	{"name": "f2", "arguments": {...}}
	</tool_call><\|im_end\|>
	```

	Note: the final `</tool_call>` is immediately followed by `<\|im_end\|>` with no
	intervening newline. This matches the training format.

	### 6.3 Arguments: String vs Object (BUG-006)

	Some frameworks (notably older OpenAI-compatible clients and some streaming
	implementations) serialise tool-call arguments as a JSON string
	(`"{\"location\": \"Berlin\"}"`) rather than as an object
	(`{"location": "Berlin"}`). The template handles both:

	```jinja2
	{%- if tc.function.arguments is string -%}
	{{- ', "arguments": ' + tc.function.arguments -}}
	{%- else -%}
	{{- ', "arguments": ' -}}{{- tc.function.arguments \| tojson -}}
	{%- endif -%}
	```

	When arguments are already a string they are passed through as-is (the caller
	is responsible for valid JSON). When they are a dict/object, `tojson` serialises
	them correctly including Unicode escaping and quote escaping.

	This arrangement also prevents the `"""` crash (BUG-006): Python triple-quoted
	strings inside Jinja2 template strings would crash the Jinja2 parser if the
	arguments dict happened to contain a value like `"""`. By using `tojson`
	(which produces a proper JSON string literal) the crash cannot occur.

	### 6.4 Tool Results

	Tool results are wrapped in a user turn using `<tool_response>`:

	```
	<\|im_start\|>user
	<tool_response>
	{result_content}
	</tool_response><\|im_end\|>
	```

	Consecutive tool-response messages are merged into a single user turn — the
	template checks whether the previous message's role was also `tool` and
	suppresses the `<\|im_start\|>user\n` header if so.

	---

	## 7. Bug Analysis and Fixes

	### BUG-001 — Historical Think Blocks Leaked (CRITICAL)

	Symptom: In multi-turn conversations with `enable_thinking=True`, every
	historical assistant message retains a collapsed `<think>\n\n</think>` block.
	Over many turns the prompt accumulates dozens of empty think blocks, degrading
	model performance.

	Root cause: Official template strips think content but leaves the surrounding
	`<think>` tags.

	Fix: Strip the entire block by splitting on `</think>` and taking the tail:

	```jinja2
	{%- set _ac = _ac.split('</think>')[-1].lstrip('\n') -%}
	```

	Tests: T10, T13, T16

	---

	### BUG-002 — KeyError on content=None / Missing content Key (HIGH)

	Symptom: When an assistant message contains only `tool_calls` and no `content`
	(or `content=None`, which is the OpenAI convention for pure tool-call responses),
	the template throws `UndefinedError` or `KeyError`.

	Root cause: Official template accesses `message.content` directly.

	Fix: Guard the access:

	```jinja2
	{%- if message.content is defined and message.content is string -%}
	{%- set _ac = message.content -%}
	{%- elif message.content is defined and message.content is iterable ... -%}
	{%- set _ac = render_content(message.content) -%}
	{%- else -%}
	{%- set _ac = '' -%}
	{%- endif -%}
	```

	Tests: T04, T11

	---

	### BUG-003 — Markup HTML-Escaping in Tool JSON (MEDIUM)

	Symptom: Tool definitions or tool-call arguments with characters like `<`, `>`,
	`&`, or `"` appear HTML-escaped in the rendered prompt (`<`, `>`, `&`,
	`"`). This causes the model to misread the tool schema.

	Root cause: `tojson` returns a Jinja2 `Markup` object. When `Markup` is
	concatenated with a plain Python string using `+` inside a Jinja2 expression,
	the plain string is auto-escaped and then concatenated with the already-safe
	`Markup` value.

	Fix: Never use `+` to join `tojson` output with plain strings. Emit each
	fragment through a separate `{{ }}` block:

	```jinja2
	{# Every fragment in its own block #}
	{{- '{"name": ' -}}{{- tc.function.name \| tojson -}}
	```

	Tests: T03, T04, T12

	---

	### BUG-004 — Multiple System Messages Not Handled (MEDIUM)

	Symptom: Frameworks such as Open WebUI send more than one `role: system`
	message. The official template either crashes or emits multiple system turns,
	both of which confuse the model.

	Root cause: No merging logic for multiple system messages.

	Fix: Pre-scan all messages and concatenate system content with `\n\n`:

	```jinja2
	{%- if ns_sys.content == '' -%}
	{%- set ns_sys.content = _c -%}
	{%- else -%}
	{%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
	{%- endif -%}
	```

	Tests: T02, T14

	---

	### BUG-005 — Wrong Non-Thinking Prefill Whitespace (LOW-MEDIUM)

	Symptom: Non-thinking mode produces a think block with incorrect whitespace,
	moving the model off its training distribution and causing output quality
	degradation or refusal to honour the non-thinking instruction.

	Root cause: The official template uses `<think>\n</think>\n\n` (missing the
	second newline inside the block), which does not match the format described in
	the technical report.

	Fix: Use the exact 19-character sequence:

	```
	<think>\n\n</think>\n\n
	```

	Tests: T08, T17

	---

	### BUG-006 — Triple-Quote Crash on Python String Arguments (MEDIUM)

	Symptom: Jinja2 raises a `TemplateSyntaxError` or produces garbled output when
	tool-call arguments contain triple-quote sequences (`"""` or `'''`) because the
	template previously embedded argument values using Python string literal
	concatenation.

	Root cause: Some community templates build the tool-call JSON via string
	interpolation (`'{"arguments": "' + args + '"}'`), which breaks for argument
	values containing quote characters.

	Fix: Use `tojson` for all non-string arguments (produces well-formed JSON) and
	pass string arguments through unchanged (caller provides valid JSON strings):

	```jinja2
	{%- if tc.function.arguments is string -%}
	{{- ', "arguments": ' + tc.function.arguments -}}
	{%- else -%}
	{{- ', "arguments": ' -}}{{- tc.function.arguments \| tojson -}}
	{%- endif -%}
	```

	Tests: T12

	---

	## 8. Template Architecture

	The template is divided into eight clearly delimited sections, each with a
	comment header:

	```
	Section 1 render_content macro
	Handles str / list (image/video/text) / None → plain text.
	Increments ns.image_count / ns.video_count for vision tokens.

	Section 2 Namespace initialisation
	Single ns object; enable_thinking defaults to true.

	Section 3 Pre-scan
	Walk all user messages; last /no_think or /think wins.

	Section 4 Collect system content
	Merge all system / developer messages with \n\n.

	Section 5 Build tools list
	Normalise every tool to {"type":"function","function":{...}}.

	Section 6 Output system turn
	Emit one <\|im_start\|>system turn (user content + tools block).

	Section 7 Main message loop
	7a system/developer → skip (already emitted)
	7b user → render with vision support
	7c assistant → render with think-block logic + tool_calls
	7d tool → group into user turns
	7e unknown role → raise_exception

	Section 8 Generation prompt
	enable_thinking=True → bare <\|im_start\|>assistant\n
	enable_thinking=False → add <think>\n\n</think>\n\n prefix
	```

	### Design Decisions

	No default system prompt. Unlike some community templates, this template does
	not inject a default system prompt when none is provided. The model performs well
	without one, and injecting one would cause conflicts for applications that rely on
	the system prompt being exactly what they set.

	No BOS token. The Qwen3 family was trained without a BOS token. Adding one
	would consume a context window slot unnecessarily and may harm performance.

	No `<\|endoftext\|>` in conversation. This token is reserved for
	end-of-document signalling in the pre-training phase, not for conversation
	boundaries.

	---

	## 9. Test Coverage

	The 17 test fixtures in `validate_template.py` cover:

	\| ID \| Scenario \| Key assertion \|
	\|---\|---\|---\|
	\| T01 \| Simple user/assistant, no system, no tools \| Exact ChatML output \|
	\| T02 \| System message \| System turn before user turn \|
	\| T03 \| Tools defined, `enable_thinking=True` \| Tools block in system; no prefill \|
	\| T04 \| Tool call, `content=None` \| No crash; `<tool_call>` present \|
	\| T05 \| Parallel tool calls \| `</tool_call>\n<tool_call>` separator \|
	\| T06 \| Tool result (role=tool) \| `<\|im_start\|>user\n<tool_response>` \|
	\| T07 \| `enable_thinking=True` generation prompt \| No think prefix emitted \|
	\| T08 \| `enable_thinking=False` generation prompt \| Exact 19-char prefill \|
	\| T09 \| `/no_think` flag in user message \| Non-thinking prefill applied \|
	\| T10 \| Historical think blocks \| Fully stripped, not collapsed \|
	\| T11 \| Missing `content` key on assistant \| No KeyError / UndefinedError \|
	\| T12 \| Special chars in arguments \| Correctly JSON-escaped \|
	\| T13 \| Historical tool-call turn with think \| Think block preserved \|
	\| T14 \| Multiple system messages \| Merged with `\n\n`; single system turn \|
	\| T15 \| Parallel tool responses \| Both inside single user turn \|
	\| T16 \| Last history turn with existing think \| Preserved verbatim \|
	\| T17 \| Last history turn, no think, `enable_thinking=False` \| Prefill injected \|

	Run the suite:

	```bash
	cd /workspace/project/qwen3_5-template
	python validate_template.py
	# Expected: 17 passed, 0 failed
	```

	---

	## 10. Tool Ecosystem Compatibility

	An analysis of 51 tool-calling frameworks and inference backends was conducted to
	verify that the template's output is consumable by the broadest possible set of
	tools. Key findings:

	### 10.1 OpenAI JSON Format Dominance

	31 of the 51 analysed tools use the OpenAI-compatible JSON function-call API
	(Group A). These tools pass tool definitions as a `tools` array and receive tool
	calls back as `message.tool_calls` objects. The template's input format is fully
	compatible with this convention.

	Notable Group A members: OpenHands, LangChain, LangGraph, LiteLLM, CrewAI,
	Pydantic AI, Open WebUI, LibreChat, LM Studio, LlamaIndex, AutoGen, LiteLLM.

	### 10.2 Inference Server Compatibility

	\| Backend \| Compatibility note \|
	\|---\|---\|
	\| vLLM \| Uses the `hermes` tool parser for Qwen models, matching this template's `<tool_call>` format exactly. \|
	\| llama.cpp \| Recognises `<tool_call>` via the `--jinja` flag + chat template loading. Note: `--jinja` disables GBNF grammar (Issue #12204). \|
	\| Ollama \| Auto-detects the tool-call tag via `parseTag()` which reads the first text node after `.ToolCalls` in the Go template tree — `<tool_call>` is one of the three known tags. \|
	\| LM Studio \| Passes tool definitions as the `tools` API field; receives tool calls in `message.tool_calls`. \|
	\| TabbyAPI \| Full OpenAI-compatible API; correct chat template is the only requirement. \|

	### 10.3 Non-Native Tool-Calling Frameworks

	Three framework groups (Cline/Roo Code XML, OpenCode `<parameter>`, Aider
	SEARCH/REPLACE) do not use the OpenAI tool-calling API at all. They inject their
	own tool descriptions into the system prompt and parse the model's text output
	directly. These frameworks do not interact with the chat template's tool-calling
	sections — they send no `tools` array and the template therefore emits no tool
	block.

	### 10.4 Arguments as JSON String

	Several frameworks (notably some streaming clients and older OpenAI SDK versions)
	serialise `tool_calls[].function.arguments` as a JSON string rather than a parsed
	object. The template's dual-path arguments handling (Section 6.3) accommodates
	both cases transparently.

	---

	Generated as part of the `fix/qwen3-template-bugs` implementation.