Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "StableQuant/Qwen-Templates-Rebuild-Project"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project

SGLang

How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
```
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
```

openhands commited on 3 days ago

Commit

1189510

1 Parent(s): 019190c

Update template: replace qwen3-5_6-template_v1.1.jinja with v1.1.2 version

Browse files

Files changed (1) hide show

qwen3-5_6-template_v1.1.jinja → qwen3_5-6-template_v1.1.2.jinja +230 -69

qwen3-5_6-template_v1.1.jinja → qwen3_5-6-template_v1.1.2.jinja RENAMED Viewed

@@ -1,12 +1,72 @@
-{#- ===== SECTION 1: MACRO render_content =====
      Handles string, list (image/video/text items), or None/undefined.
      count_vision=true: increments ns.image_count / ns.video_count.
 -#}
-{%- macro render_content(content, count_vision=false) -%}
-  {%- if content is string -%}
     {{- content -}}
-  {%- elif content is iterable and content is not mapping -%}
     {%- for item in content -%}
       {%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
         {%- if count_vision -%}{%- set ns.image_count = ns.image_count + 1 -%}{%- endif -%}
         {%- if add_vision_id is defined and add_vision_id -%}
@@ -21,22 +81,70 @@
         {{- '<|vision_start|><|video_pad|><|vision_end|>' -}}
       {%- elif item.type == 'text' or 'text' in item -%}
         {{- item.text -}}
       {%- endif -%}
     {%- endfor -%}
   {%- endif -%}
 {%- endmacro -%}
 {#- ===== SECTION 2: NAMESPACE INITIALISATION =====
      Single ns object for all mutable state.
-     enable_thinking  default=true  (BUG-003 fix)
-     preserve_thinking default=true: when false, suppresses think-block output in
-                       generation prompt and overrides enable_thinking to false.
-                       Passed via --chat-template-kwargs {"preserve_thinking":false}.
 -#}
 {%- set ns = namespace(
     enable_thinking=true,
     image_count=0,
-    video_count=0
 ) -%}
 {#- Resolve enable_thinking kwarg -#}
@@ -48,13 +156,18 @@
   {%- endif -%}
 {%- endif -%}
-{#- Resolve preserve_thinking kwarg.
     preserve_thinking=false  => force non-thinking mode (same as enable_thinking=false).
     preserve_thinking=true   => default, no override (thinking controlled by enable_thinking).
     When not defined         => default, no override.
 -#}
-{%- if preserve_thinking is defined and not preserve_thinking -%}
-  {%- set ns.enable_thinking = false -%}
 {%- endif -%}
 {#- ===== SECTION 3: PRE-SCAN =====
@@ -82,14 +195,26 @@
   {%- endif -%}
 {%- endfor -%}
-{#- ===== SECTION 4: COLLECT SYSTEM CONTENT =====
-     Merge all system/developer messages with \n\n separator (BUG-004 fix).
      <|think_off|> / <|think_on|> markers are stripped from output.
 -#}
 {%- set ns_sys = namespace(content='') -%}
 {%- for msg in messages -%}
   {%- if msg.role == 'system' or msg.role == 'developer' -%}
-    {%- set _c = render_content(msg.content | default('')) | trim -%}
     {%- set _c = _c | replace('<|think_off|>', '') | replace('<|think_on|>', '') | trim -%}
     {%- if _c -%}
       {%- if ns_sys.content == '' -%}
@@ -101,7 +226,7 @@
   {%- endif -%}
 {%- endfor -%}
-{#- ===== SECTION 5: BUILD TOOLS LIST =====
      Normalise each tool to {"type":"function","function":{...}} format.
      Serialisation happens later at output time (avoids Markup + str escaping bugs).
 -#}
@@ -117,7 +242,7 @@
   {%- endfor -%}
 {%- endif -%}
-{#- ===== SECTION 6: OUTPUT SYSTEM TURN =====
      Each fragment output via its own {{ }} block so tojson Markup objects are
      never Python-concatenated with plain strings (would trigger HTML-escaping).
      User system content appears BEFORE the tools block (correct ordering).
@@ -140,26 +265,32 @@
   {{- '<|im_end|>\n' -}}
 {%- endif -%}
-{#- ===== SECTION 7: MAIN MESSAGE LOOP ===== -#}
 {%- for message in messages -%}
-  {#- 7a: System / Developer — already rendered above, skip -#}
   {%- if message.role == 'system' or message.role == 'developer' -%}
-  {#- 7b: User messages -#}
   {%- elif message.role == 'user' -%}
-    {%- set _uc = render_content(message.content | default(''), true) -%}
     {{- '<|im_start|>user\n' + _uc + '<|im_end|>\n' -}}
-  {#- 7c: Assistant messages -#}
   {%- elif message.role == 'assistant' -%}
-    {#- Safely extract content as string — guard against absent key (BUG-002 fix).
         Also support message.reasoning_content as an explicit think-block source
         (used by some frameworks that store thinking separately from content). -#}
     {%- if message.content is defined and message.content is string -%}
       {%- set _ac = message.content -%}
-    {%- elif message.content is defined and message.content is iterable and message.content is not mapping -%}
-      {%- set _ac = render_content(message.content) -%}
     {%- else -%}
       {%- set _ac = '' -%}
     {%- endif -%}
@@ -174,7 +305,8 @@
     {%- endif -%}
     {#- Collect tool_calls if present -#}
-    {%- set _tc = message.tool_calls if message.tool_calls is defined and message.tool_calls else [] -%}
     {#- Strip <tool_call> prefix from content when tool_calls also present
         (some frameworks duplicate the data in both fields) -#}
@@ -182,38 +314,50 @@
       {%- set _ac = _ac.split('<tool_call>')[0] | trim -%}
     {%- endif -%}
-    {#- Determine if this is the last-in-history assistant turn.
-        When add_generation_prompt=False and this is the last message, think blocks
-        must be preserved (and non-thinking prefill applied if needed).
-        All other turns have their think blocks stripped. -#}
-    {%- set _is_last_hist = loop.last and not (add_generation_prompt | default(false)) -%}
-    {#- Think-block handling (BUG-001 fix + last-turn preservation):
-        - Tool-call turns   : never strip (think block is part of the tool-call format)
-        - Last-history turn : preserve; inject non-thinking prefill when absent
-        - Historical turns  : strip using fuzzy end-tag matching to handle
-                              </think>, </thinking>, </ think>, </think > variants -#}
-    {%- if not _tc -%}
-      {%- if _is_last_hist -%}
-        {%- if '<think>' not in _ac and not ns.enable_thinking -%}
-          {%- set _ac = '<think>\n\n</think>\n\n' + _ac -%}
-        {%- endif -%}
-      {%- else -%}
-        {#- Fuzzy end-tag detection for historical turn stripping -#}
-        {%- set _think_end = '' -%}
-        {%- if '</think>' in _ac -%}
-          {%- set _think_end = '</think>' -%}
-        {%- elif '</thinking>' in _ac -%}
-          {%- set _think_end = '</thinking>' -%}
-        {%- elif '</ think>' in _ac -%}
-          {%- set _think_end = '</ think>' -%}
-        {%- elif '</think >' in _ac -%}
-          {%- set _think_end = '</think >' -%}
-        {%- endif -%}
-        {%- if _think_end -%}
-          {%- set _ac = _ac.split(_think_end)[-1].lstrip('\n') -%}
-        {%- endif -%}
       {%- endif -%}
     {%- endif -%}
     {#- Emit the assistant turn -#}
@@ -223,9 +367,9 @@
       {%- if _tc -%}{{- '\n' -}}{%- endif -%}
     {%- endif -%}
-    {#- Render tool calls in Hermes format (BUG-006 fix: arguments as-is or tojson).
         Each value output via its own {{ }} block — never concatenated with plain strings
-        in Python, which would trigger Markup HTML-escaping (BUG-003/markup fix). -#}
     {%- if _tc -%}
       {%- for tc in _tc -%}
         {{- '<tool_call>\n' -}}
@@ -245,15 +389,31 @@
     {%- endif -%}
     {{- '<|im_end|>\n' -}}
-  {#- 7d: Tool results — group consecutive tool messages into one user turn -#}
   {%- elif message.role == 'tool' -%}
     {%- set _prev_role = messages[loop.index0 - 1].role if loop.index0 > 0 else '' -%}
     {%- set _next_role = messages[loop.index0 + 1].role if not loop.last else '' -%}
     {%- if _prev_role != 'tool' -%}
       {{- '<|im_start|>user\n' -}}
     {%- endif -%}
     {{- '<tool_response>\n' -}}
-    {{- message.content | default('') -}}
     {%- if _next_role == 'tool' -%}
       {{- '\n</tool_response>\n' -}}
     {%- else -%}
@@ -261,24 +421,25 @@
       {{- '<|im_end|>\n' -}}
     {%- endif -%}
-  {#- 7e: Unknown role -#}
   {%- else -%}
     {{- raise_exception('Unexpected message role: ' + message.role) -}}
   {%- endif -%}
 {%- endfor -%}
-{#- ===== SECTION 8: GENERATION PROMPT =====
      enable_thinking=True  → open <think>\n prefill so llama.cpp reasoning-budget
                              and other inference engines can hook into the think-stream.
                              The model continues generating inside the open block.
-     enable_thinking=False → exact non-thinking prefill: <think>\n\n</think>\n\n
-                             (19-char closed block, BUG-005 fix)
      NOTE: The <think>\n opener is EPHEMERAL — it lives only in the generation
-     prompt, never in chat history. Historical think-block stripping (BUG-001)
-     is handled in Section 7c and is entirely unaffected by this change.
-     No context poisoning risk.
 -#}
 {%- if add_generation_prompt -%}
   {{- '<|im_start|>assistant\n' -}}
@@ -287,4 +448,4 @@
   {%- else -%}
     {{- '<think>\n\n</think>\n\n' -}}
   {%- endif -%}
-{%- endif -%}

+{#- ============================================================================
+    Qwen Chat Template v0.8
+    Based on v0.7 with additional compatibility fixes for llama.cpp
+    FIXES APPLIED IN v0.8:
+    1. Type guard in detect_tool_error macro (prevents errors on non-string input)
+    2. Type check for tool_calls (ensures list type before iteration)
+    3. Replaced emoji in tool error warnings with text-only (tokenization safety)
+    FIXES APPLIED IN v0.7:
+    1. Tool call error handling with consecutive_failures tracking
+    2. System message media validation (raises exception for images/videos)
+    3. Empty messages validation (raises exception if no messages)
+    4. Unknown content type handling (raises exception for unexpected types)
+    5. Think-block display logic (preserve_thinking controls ALL assistant messages, not just generation prompt)
+    Sections:
+    - 1A: MACRO render_content (with media validation for system content)
+    - 1B: MACRO detect_tool_error (error detection for tool responses)
+    - 2: NAMESPACE INITIALISATION (with error tracking)
+    - 3: PRE-SCAN (thinking mode detection)
+    - 4: VALIDATE MESSAGES (empty messages check)
+    - 5: COLLECT SYSTEM CONTENT (with media validation for system messages)
+    - 6: BUILD TOOLS LIST
+    - 7: OUTPUT SYSTEM TURN
+    - 8: MAIN MESSAGE LOOP (with error detection in tool responses)
+    - 9: GENERATION PROMPT (fixed preserve_thinking logic)
+============================================================================ -#}
+{#- ===== HELPER: raise_exception macro =====
+     Jinja2 doesn't have a built-in raise_exception.
+     This macro outputs an error marker in the rendered output.
+     Callers should check output for "ERROR:" pattern to detect validation failures.
+-#}
+{%- macro raise_exception(message) -%}
+  {{- '\n[ERROR: ' ~ message ~ ']' -}}
+{%- endmacro -%}
+{#- ===== SECTION 1A: MACRO render_content =====
      Handles string, list (image/video/text items), or None/undefined.
      count_vision=true: increments ns.image_count / ns.video_count.
+     is_system_content=false: Set true when rendering system/developer content
+                            to enable media validation (raises exception).
+     count_vision=true: increments vision counters.
 -#}
+{%- macro render_content(content, count_vision=false, is_system_content=false) -%}
+  {#- VALIDATION: System messages cannot contain images or videos (from v18) -#}
+  {#- FIX: also exclude strings and handle None - llama.cpp treats strings as non-iterable in for loops -#}
+  {%- if is_system_content and content is iterable and content is not mapping and content is not string and content is not none -%}
+    {%- for item in content -%}
+      {%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
+        {{- raise_exception('System message cannot contain images.') -}}
+      {%- endif -%}
+      {%- if item.type == 'video' or 'video' in item -%}
+        {{- raise_exception('System message cannot contain videos.') -}}
+      {%- endif -%}
+    {%- endfor -%}
+  {%- endif -%}
+  {#- Main content rendering -#}
+  {#- Handle None/undefined content -#}
+  {%- if content is none or content is defined == false -%}
+    {{- '' -}}
+  {%- elif content is string -%}
     {{- content -}}
+  {#- FIX: also exclude strings - llama.cpp treats strings as non-iterable in for loops -#}
+  {%- elif content is iterable and content is not mapping and content is not string -%}
     {%- for item in content -%}
+      {#- Handle different item types -#}
       {%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
         {%- if count_vision -%}{%- set ns.image_count = ns.image_count + 1 -%}{%- endif -%}
         {%- if add_vision_id is defined and add_vision_id -%}
         {{- '<|vision_start|><|video_pad|><|vision_end|>' -}}
       {%- elif item.type == 'text' or 'text' in item -%}
         {{- item.text -}}
+      {#- ERROR: Unknown content type - raise explicit exception (from v18) -#}
+      {%- else -%}
+        {{- raise_exception('Unexpected content type in message content.') -}}
       {%- endif -%}
     {%- endfor -%}
+  {#- ERROR: Unknown content type - raise explicit exception (from v18) -#}
+  {%- elif content is not none and content is defined -%}
+    {{- raise_exception('Unexpected content type.') -}}
+  {%- endif -%}
+{%- endmacro -%}
+{#- ===== SECTION 1B: MACRO detect_tool_error (NEW in v0.7) =====
+     Detects if a tool response contains error indicators.
+     Uses heuristics from v18:
+     - Checks for error keywords (error, exception, traceback, failed to)
+     - Ignores responses with '$ ' (shell output prefix) or 'took ' (timing info)
+     - Ignores responses > 500 chars (likely valid output, not error)
+     Returns: ns.last_tool_failed (true/false)
+     Side effect: Updates ns.consecutive_failures counter
+-#}
+{%- macro detect_tool_error(content) -%}
+  {#- Type guard: ensure content is string (llama.cpp compatibility) -#}
+  {%- set content = content if content is string else '' -%}
+  {%- set content_lower = content | lower -%}
+  {%- set content_length = content | length -%}
+  {#- Error detection heuristics: short response + no shell prefix + has error keywords -#}
+  {%- if content_length < 500
+      and '$ ' not in content
+      and 'took ' not in content_lower
+      and ('"error":' in content_lower or 'error:' in content_lower
+           or 'exception:' in content_lower or 'traceback' in content_lower
+           or 'command not found' in content_lower or 'invalid syntax' in content_lower
+           or 'failed to' in content_lower or 'permission denied' in content_lower) -%}
+    {#- Error detected - update failure tracking -#}
+    {%- set ns.last_tool_failed = true -%}
+    {%- set ns.consecutive_failures = ns.consecutive_failures + 1 -%}
+  {%- else -%}
+    {#- No error - reset failure tracking -#}
+    {%- set ns.last_tool_failed = false -%}
+    {%- set ns.consecutive_failures = 0 -%}
   {%- endif -%}
 {%- endmacro -%}
 {#- ===== SECTION 2: NAMESPACE INITIALISATION =====
      Single ns object for all mutable state.
+     enable_thinking:  default=true (controls think-block in generation prompt)
+     preserve_thinking: default=true (controls think-block display in conversation history)
+     image_count:      Vision counter for images
+     video_count:      Vision counter for videos
+     NEW in v0.7:
+     - consecutive_failures: Tracks consecutive tool call failures (from v18)
+     - last_tool_failed: Boolean flag for current tool response (from v18)
 -#}
 {%- set ns = namespace(
     enable_thinking=true,
+    preserve_thinking=true,
     image_count=0,
+    video_count=0,
+    consecutive_failures=0,
+    last_tool_failed=false
 ) -%}
 {#- Resolve enable_thinking kwarg -#}
   {%- endif -%}
 {%- endif -%}
+{#- Resolve preserve_thinking kwarg (FIXED in v0.7: now also affects conversation history, not just generation prompt).
     preserve_thinking=false  => force non-thinking mode (same as enable_thinking=false).
     preserve_thinking=true   => default, no override (thinking controlled by enable_thinking).
     When not defined         => default, no override.
 -#}
+{%- if preserve_thinking is defined -%}
+  {%- if not preserve_thinking -%}
+    {%- set ns.enable_thinking = false -%}
+    {%- set ns.preserve_thinking = false -%}
+  {%- else -%}
+    {%- set ns.preserve_thinking = true -%}
+  {%- endif -%}
 {%- endif -%}
 {#- ===== SECTION 3: PRE-SCAN =====
   {%- endif -%}
 {%- endfor -%}
+{#- ===== SECTION 4: VALIDATE MESSAGES (NEW in v0.7) =====
+     Validate that messages is provided and not empty.
+     From v18: raises exception if no messages provided.
+-#}
+{%- if not messages -%}
+  {{- raise_exception('No messages provided.') -}}
+{%- endif -%}
+{#- ===== SECTION 5: COLLECT SYSTEM CONTENT =====
+     Merge all system/developer messages with \n\n separator.
      <|think_off|> / <|think_on|> markers are stripped from output.
+     FIXED in v0.7: Pass is_system_content=true to render_content to trigger
+     media validation (raises exception if system contains images/videos).
 -#}
 {%- set ns_sys = namespace(content='') -%}
 {%- for msg in messages -%}
   {%- if msg.role == 'system' or msg.role == 'developer' -%}
+    {#- Pass is_system_content=true for media validation -#}
+    {%- set _c = render_content(msg.content | default(''), false, true) | trim -%}
     {%- set _c = _c | replace('<|think_off|>', '') | replace('<|think_on|>', '') | trim -%}
     {%- if _c -%}
       {%- if ns_sys.content == '' -%}
   {%- endif -%}
 {%- endfor -%}
+{#- ===== SECTION 6: BUILD TOOLS LIST =====
      Normalise each tool to {"type":"function","function":{...}} format.
      Serialisation happens later at output time (avoids Markup + str escaping bugs).
 -#}
   {%- endfor -%}
 {%- endif -%}
+{#- ===== SECTION 7: OUTPUT SYSTEM TURN =====
      Each fragment output via its own {{ }} block so tojson Markup objects are
      never Python-concatenated with plain strings (would trigger HTML-escaping).
      User system content appears BEFORE the tools block (correct ordering).
   {{- '<|im_end|>\n' -}}
 {%- endif -%}
+{#- ===== SECTION 8: MAIN MESSAGE LOOP =====
+     FIXED in v0.7:
+     - Tool responses now have error detection via detect_tool_error macro
+     - Warning messages injected for failed tool calls
+     - consecutive_failures tracking for escalating warnings
+-#}
 {%- for message in messages -%}
+  {#- 8a: System / Developer — already rendered above, skip -#}
   {%- if message.role == 'system' or message.role == 'developer' -%}
+  {#- 8b: User messages -#}
   {%- elif message.role == 'user' -%}
+    {%- set _uc = render_content(message.content | default(''), true, false) -%}
     {{- '<|im_start|>user\n' + _uc + '<|im_end|>\n' -}}
+  {#- 8c: Assistant messages -#}
   {%- elif message.role == 'assistant' -%}
+    {#- Safely extract content as string — guard against absent key.
         Also support message.reasoning_content as an explicit think-block source
         (used by some frameworks that store thinking separately from content). -#}
     {%- if message.content is defined and message.content is string -%}
       {%- set _ac = message.content -%}
+    {#- FIX: also exclude strings - llama.cpp treats strings as non-iterable in for loops -#}
+    {%- elif message.content is defined and message.content is iterable and message.content is not mapping and message.content is not string -%}
+      {%- set _ac = render_content(message.content, false, false) -%}
     {%- else -%}
       {%- set _ac = '' -%}
     {%- endif -%}
     {%- endif -%}
     {#- Collect tool_calls if present -#}
+    {#- Type check: ensure tool_calls is a list, not string (llama.cpp compatibility) -#}
+    {%- set _tc = message.tool_calls if message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls is not string else [] -%}
     {#- Strip <tool_call> prefix from content when tool_calls also present
         (some frameworks duplicate the data in both fields) -#}
       {%- set _ac = _ac.split('<tool_call>')[0] | trim -%}
     {%- endif -%}
+    {#- FIXED in v0.7: Think-block handling with preserve_thinking support
+       New logic (from v18): preserve_thinking controls think-block display on ALL
+       assistant messages, not just generation prompt:
+       - Tool-call turns   : never strip (think block is part of the tool-call format)
+       - preserve_thinking : if true, show think blocks on ALL messages
+       - Last-history turn : if preserve_thinking false, apply last-turn handling
+       - Historical turns  : if preserve_thinking false, strip think blocks
+       The old behavior (strip unless add_generation_prompt) is now controlled
+       by preserve_thinking parameter.
+    -#}
+    {%- set _show_think = false -%}
+    {%- if _tc -%}
+      {#- Tool calls: always show think block -#}
+      {%- set _show_think = true -%}
+    {%- elif ns.preserve_thinking -%}
+      {#- preserve_thinking=true: show think blocks on all messages -#}
+      {%- set _show_think = true -%}
+    {%- elif loop.last -%}
+      {#- Last message without preserve_thinking: show if thinking enabled -#}
+      {%- set _show_think = ns.enable_thinking -%}
+    {%- endif -%}
+    {#- Apply think-block stripping based on _show_think flag -#}
+    {%- if not _show_think -%}
+      {#- Fuzzy end-tag detection for stripping -#}
+      {%- set _think_end = '' -%}
+      {%- if '</think>' in _ac -%}
+        {%- set _think_end = '</think>' -%}
+      {%- elif '</thinking>' in _ac -%}
+        {%- set _think_end = '</thinking>' -%}
+      {%- elif '</ think>' in _ac -%}
+        {%- set _think_end = '</ think>' -%}
+      {%- elif '</think >' in _ac -%}
+        {%- set _think_end = '</think >' -%}
+      {%- endif -%}
+      {%- if _think_end -%}
+        {%- set _ac = _ac.split(_think_end)[-1].lstrip('\n') -%}
       {%- endif -%}
+    {%- elif not _tc and loop.last and '<think>' not in _ac and not ns.enable_thinking -%}
+      {#- Last turn, non-thinking: inject empty think block if missing -#}
+      {%- set _ac = '<think>\n\n</think>\n\n' + _ac -%}
     {%- endif -%}
     {#- Emit the assistant turn -#}
       {%- if _tc -%}{{- '\n' -}}{%- endif -%}
     {%- endif -%}
+    {#- Render tool calls in Hermes format.
         Each value output via its own {{ }} block — never concatenated with plain strings
+        in Python, which would trigger Markup HTML-escaping. -#}
     {%- if _tc -%}
       {%- for tc in _tc -%}
         {{- '<tool_call>\n' -}}
     {%- endif -%}
     {{- '<|im_end|>\n' -}}
+  {#- 8d: Tool results — with error detection (NEW in v0.7) -#}
   {%- elif message.role == 'tool' -%}
     {%- set _prev_role = messages[loop.index0 - 1].role if loop.index0 > 0 else '' -%}
     {%- set _next_role = messages[loop.index0 + 1].role if not loop.last else '' -%}
+    {#- NEW in v0.7: Detect errors in tool response -#}
+    {%- set _tool_content = message.content | default('') -%}
+    {{- detect_tool_error(_tool_content) -}}
     {%- if _prev_role != 'tool' -%}
       {{- '<|im_start|>user\n' -}}
     {%- endif -%}
     {{- '<tool_response>\n' -}}
+    {{- _tool_content -}}
+    {#- NEW in v0.7: Inject warning if tool error detected -#}
+    {#- v0.8: Replaced emoji with text-only for tokenization safety -#}
+    {%- if ns.last_tool_failed -%}
+      {%- if ns.consecutive_failures >= 2 -%}
+        {{- '\n\n[SYSTEM WARNING: ' ~ ns.consecutive_failures ~ ' consecutive tool errors detected. Your previous approach is incorrect.]' -}}
+      {%- else -%}
+        {{- '\n\n[SYSTEM WARNING: The previous tool call returned an error. Diagnose the failure and retry with corrected arguments.]' -}}
+      {%- endif -%}
+    {%- endif -%}
     {%- if _next_role == 'tool' -%}
       {{- '\n</tool_response>\n' -}}
     {%- else -%}
       {{- '<|im_end|>\n' -}}
     {%- endif -%}
+  {#- 8e: Unknown role - explicit error (from v18) -#}
   {%- else -%}
     {{- raise_exception('Unexpected message role: ' + message.role) -}}
   {%- endif -%}
 {%- endfor -%}
+{#- ===== SECTION 9: GENERATION PROMPT =====
+     FIXED in v0.7: preserve_thinking now affects conversation history (Section 8),
+     so generation prompt logic is simplified.
      enable_thinking=True  → open <think>\n prefill so llama.cpp reasoning-budget
                              and other inference engines can hook into the think-stream.
                              The model continues generating inside the open block.
+     enable_thinking=False → exact non-thinking prefill: </think>\n\n
      NOTE: The <think>\n opener is EPHEMERAL — it lives only in the generation
+     prompt, never in chat history. Historical think-block stripping is handled
+     in Section 8 based on preserve_thinking setting.
 -#}
 {%- if add_generation_prompt -%}
   {{- '<|im_start|>assistant\n' -}}
   {%- else -%}
     {{- '<think>\n\n</think>\n\n' -}}
   {%- endif -%}
+{%- endif -%}