Think traces broken
It's not a chat template issue, GLM 4.7, like deepseek does not output the initial token.
You need to use https://github.com/theroyallab/tabbyAPI/pull/295 for proper think traces. I put my DOcker script in the model card.
And if you need normal rewrap, I build a tool for that because it was annoying me to hell: https://github.com/mratsim/llm-reasoning-proxy
Even in SGLang you have weirdly named reasoning parsers like minimax-append-think: https://github.com/sgl-project/sglang/issues/15508
But GLM. 4.6 doesn't have this problem?
No, GLM 4.6 always does reasoning_content
If anything, it's probably intended to save on token and assume that LLM serving framework will have a parser to handle that.
This happens at the very least for:
- GLM-4.7
- MiniMax M2 and M2.1 https://github.com/sgl-project/sglang/issues/15508
- QwQ: https://huggingface.co/Qwen/QwQ-32B/discussions/4
- DeepSeek R1:
Though it seems like later on DeepSeek backtracked: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f
Now in GLM-4.7 I think just using is intentional from that code:
https://huggingface.co/zai-org/GLM-4.7/blob/main/chat_template.jinja
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
{%- set reasoning_content = m.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
{{ '<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '</think>' }}
{%- endif -%}
If you define clear_thinking you get proper reasoning_content wrapping, otherwise it just uses </think>
