Improvement (50% token reduction via the tool)
Hello, i’m using a modified version of this template. This more compact tool is faster and provides a less noisy context for the LLM. I think it’s a powerful improvement over your original template: https://gist.github.com/webel/3c2cef9671119d71fc902d0c301db4eb
Very cool! Thank you. I have incorporate in v10. Please try.
I tested v10, and it's looping and overthinking. I don't understand why. I've published my custom template (a fusion of the Unsloth template and a compact tool) if you'd like to check it out.
https://huggingface.co/Elsephire/Qwen3.6-template-jinja/blob/main/qwen3.6-unsloth-and-compact-tools.jinja
Still in the process of validating, but a quick AI check gave me this:
v9 vs v10 Comparison: Key Differences
Tool Rendering (The "Compaction" Change)
| Version | Tool Format | Token Usage (8 tools) |
|---|---|---|
| v9 (line 68-69) | `{{- tool | tojson }}` — full JSON schema dump |
| v10 (lines 68-107) | Compact one-liners: `remember(text: string, room?: general | prefs)` + optional schema dump only for array/object types |
v10 "Compaction" Changes
- Lines 68-107: Replaced
{{- tool | tojson }}with manual property iteration, rendering typed one-liners - Lines 100-107: Schema dump only for
array|objecttypes (not always) - Line 93: Added
has_toolsflag tons_flagsnamespace
Potential Issues in v10 Templates
Issue 1: Forced Thinking on Post-Query Tool Rounds (High Severity)
Location: chat_template-v10.jinja:157 (same as v9 line 157)
{%- if loop.index0 > ns.last_query_index %}
{{- '🤖' + message.role + '\n</think>\n\n' + content }}
{%- else %}
{{- '🤖' + message.role + '\n' + content }}
{%- endif %}
Problem: The two-pass algorithm (lines 93-106) finds the last user query index. Any assistant message after that index (i.e., intermediate tool-call responses in multi-step agentic flows) gets forced thinking injection. This means every tool-use round triggers unnecessary </think> blocks.
Effect: Overthinking during agentic loops, wasted tokens, increased latency per tool round.
Issue 2: Empty Thinking Block When Thinking Disabled (Medium Severity)
Location: chat_template-v10.jinja:214-215
{%- if ns_flags.enable_thinking is false %}
{{- '</think>\n\n' }}
{%- else %}
{{- '<think>\n' }}
{%- endif %}
Problem: When enable_thinking is explicitly false, the template outputs <think>\n\n</think>\n\n — an empty thinking block. This:
- Wastes 2 special tokens per generation start
- May conditionally encourage the model to always reason, even for non-reasoning tasks
v9 behavior: Identical (line 214-218). This is a pre-existing issue carried forward.
Issue 3: Schema Dump Still Present for Complex Types (Low Severity)
Location: chat_template-v10.jinja:100-107
{%- if fn.parameters is defined and fn.parameters is mapping %}
{%- for pname in props %}
{%- set pdef = props[pname] %}
{%- if pdef.type is defined and (pdef.type == 'array' or pdef.type == 'object') %}
{{- '\n - ' ~ pname ~ ' schema: ' ~ pdef | tojson }}
{%- endif %}
{%- endfor %}
{%- endif %}
Problem: For tools with array or object type parameters, the full JSON schema is still dumped via tojson. This partially undermines the token savings goal. If all tool parameters are complex types, you get nearly the same token usage as v9.
Issue 4: has_tools Flag Not Used for Generation Prompt (Low Severity)
Location: chat_template-v10.jinja:216-217
The ns_flags.has_tools flag is set at line 65 but only used at line 216 to inject an additional reminder about function call format. This is redundant since the instruction block (lines 109-114) already contains the same reminder. It adds ~80 tokens of duplication.
Summary Table
| Issue | Severity | v9 Has It? | v10 Has It? |
|---|---|---|---|
| Forced thinking on post-query assistants | High | Yes (line 157) | Yes (line 157) — unchanged from v9 |
| Empty thinking block when disabled | Medium | Yes (line 214-218) | Yes (line 214-215) — unchanged from v9 |
| Schema dump for complex types | N/A | No (always full JSON) | Yes (partial, only array/object) — new in v10 |
| Redundant function call reminder | Low | No | Yes (line 216-217) — new in v10 |
Key Finding: The forced-thinking issue and empty-thinking-block issue are pre-existing from v9, not introduced by v10's compaction changes. The v10 "compaction" itself is functionally correct and does not introduce new logic bugs — it only changes how tools are rendered in the prompt.
If you want to fix the forced-thinking issue, you would need to remove the loop.index0 > ns.last_query_index conditional (lines 157-161) and always use standard assistant message rendering without forced </think> injection.