Improvement (50% token reduction via the tool)

by Elsephire - opened about 17 hours ago

Hello, i’m using a modified version of this template. This more compact tool is faster and provides a less noisy context for the LLM. I think it’s a powerful improvement over your original template: https://gist.github.com/webel/3c2cef9671119d71fc902d0c301db4eb

Elsephire changed discussion title from improvement (50% token reduction with tool) to Improvement (50% token reduction via the tool) about 17 hours ago

froggeric

Owner about 14 hours ago

Very cool! Thank you. I have incorporate in v10. Please try.

Elsephire

about 9 hours ago

I tested v10, and it's looping and overthinking. I don't understand why. I've published my custom template (a fusion of the Unsloth template and a compact tool) if you'd like to check it out.
https://huggingface.co/Elsephire/Qwen3.6-template-jinja/blob/main/qwen3.6-unsloth-and-compact-tools.jinja

astride-thee-squid

about 1 hour ago

Still in the process of validating, but a quick AI check gave me this:

v9 vs v10 Comparison: Key Differences

Tool Rendering (The "Compaction" Change)

Version	Tool Format	Token Usage (8 tools)
v9 (line 68-69)	`{{- tool	tojson }}` — full JSON schema dump
v10 (lines 68-107)	Compact one-liners: `remember(text: string, room?: general	prefs)` + optional schema dump only for array/object types

v10 "Compaction" Changes

Lines 68-107: Replaced {{- tool | tojson }} with manual property iteration, rendering typed one-liners
Lines 100-107: Schema dump only for array|object types (not always)
Line 93: Added has_tools flag to ns_flags namespace

Potential Issues in v10 Templates

Issue 1: Forced Thinking on Post-Query Tool Rounds (High Severity)

Location: chat_template-v10.jinja:157 (same as v9 line 157)

{%- if loop.index0 > ns.last_query_index %}
    {{- '🤖' + message.role + '\n</think>\n\n' + content }}
{%- else %}
    {{- '🤖' + message.role + '\n' + content }}
{%- endif %}

Problem: The two-pass algorithm (lines 93-106) finds the last user query index. Any assistant message after that index (i.e., intermediate tool-call responses in multi-step agentic flows) gets forced thinking injection. This means every tool-use round triggers unnecessary </think> blocks.

Effect: Overthinking during agentic loops, wasted tokens, increased latency per tool round.

Issue 2: Empty Thinking Block When Thinking Disabled (Medium Severity)

Location: chat_template-v10.jinja:214-215

{%- if ns_flags.enable_thinking is false %}
    {{- '</think>\n\n' }}
{%- else %}
    {{- '<think>\n' }}
{%- endif %}

Problem: When enable_thinking is explicitly false, the template outputs <think>\n\n</think>\n\n — an empty thinking block. This:

Wastes 2 special tokens per generation start
May conditionally encourage the model to always reason, even for non-reasoning tasks

v9 behavior: Identical (line 214-218). This is a pre-existing issue carried forward.

Issue 3: Schema Dump Still Present for Complex Types (Low Severity)

Location: chat_template-v10.jinja:100-107

{%- if fn.parameters is defined and fn.parameters is mapping %}
    {%- for pname in props %}
        {%- set pdef = props[pname] %}
        {%- if pdef.type is defined and (pdef.type == 'array' or pdef.type == 'object') %}
            {{- '\n  - ' ~ pname ~ ' schema: ' ~ pdef | tojson }}
        {%- endif %}
    {%- endfor %}
{%- endif %}

Problem: For tools with array or object type parameters, the full JSON schema is still dumped via tojson. This partially undermines the token savings goal. If all tool parameters are complex types, you get nearly the same token usage as v9.

Issue 4: `has_tools` Flag Not Used for Generation Prompt (Low Severity)

Location: chat_template-v10.jinja:216-217

The ns_flags.has_tools flag is set at line 65 but only used at line 216 to inject an additional reminder about function call format. This is redundant since the instruction block (lines 109-114) already contains the same reminder. It adds ~80 tokens of duplication.

Summary Table

Issue	Severity	v9 Has It?	v10 Has It?
Forced thinking on post-query assistants	High	Yes (line 157)	Yes (line 157) — unchanged from v9
Empty thinking block when disabled	Medium	Yes (line 214-218)	Yes (line 214-215) — unchanged from v9
Schema dump for complex types	N/A	No (always full JSON)	Yes (partial, only array/object) — new in v10
Redundant function call reminder	Low	No	Yes (line 216-217) — new in v10

Key Finding: The forced-thinking issue and empty-thinking-block issue are pre-existing from v9, not introduced by v10's compaction changes. The v10 "compaction" itself is functionally correct and does not introduce new logic bugs — it only changes how tools are rendered in the prompt.

If you want to fix the forced-thinking issue, you would need to remove the loop.index0 > ns.last_query_index conditional (lines 157-161) and always use standard assistant message rendering without forced </think> injection.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment