v13 to v15 are completely broken in VLLM for qwen3_coder parser

#16
by gudsjwixvfasptjhtr - opened

There is a breaking change in formatting from v12 to v13 which completely breaks qwen3_coder parser in VLLM.

qwen3_coder expects:
<tool_call><function=tool_name... (XML style)
but the v13 and up produce:
<tool_call>{"name": "tool_name",... (JSON/XML style mix)

I found only the "hermes" tool parser kinda works for a while but ultimately fails.

Please can you revert to qwen3_coder style which is expected by the model and supported by all inference engines?
Also, having tests would help maintain template stability and track progress.
Thank you!

Thanks for the report. I introduced the JSON forma changed in v13 to fix MCP parser crashes. But since it breaks vLLM, I will try to revert back to XML only format while keeping the other fixes working...

I have a test suite, but of course it is impossible to predict everything.

Please try chat_template-v16-test1.jinja

  • I reverted back to the native qwen3_coder format
  • I removed the failure reset from v12 on assistant message, to maintain compatibility with my newer error handling

I ran it through my test suite with added tests based on what you reported, but I have been able to manually test it, as my computer is busy through a long coding session.

Wow that was fast.

I gave it a run with file patching and some MCP usage.
Seems working fine!

Thank you!

Line -> {%- if content | length < 500 and ('error' in content_lower or 'fail' in content_lower or 'invalid' in content_lower or 'exceeds' in content_lower or 'retry' in content_lower or 'exception' in content_lower) %
In versions v15 and v16test, Is this a new feature or a problem :)
It detects an "error," but the "error_message" here is correct, it's not a bug. It's trying to call the tool too many times.

Log:

$ grep -n "error_message.*""" E:/llama/internal/api/orchestrator.go (timeout 5s)

661: "error_message": "",

Took 0.1s

The previous tool call returned an error. I must diagnose the failure and retry with completely corrected arguments:

$ grep -n "error_message|error_message" E:/llama/internal/api/orchestrator.go (timeout 5s)

661: "error_message": "",

Took 0.1s

⚠️ 2 consecutive tool errors. My previous approach is wrong. I will now call the tool with a fundamentally different, corrected approach:

$ grep -n "error_message" E:/llama/internal/api/orchestrator.go (timeout 5s)

661: "error_message": "",

Took 0.1s

⚠️ 3 consecutive tool errors. My previous approach is wrong. I will now call the tool with a fundamentally different, corrected approach:

$ grep -n "error_message" E:/llama/internal/api/orchestrator.go (timeout 5s)

661: "error_message": "",

Took 0.1s

⚠️ 4 consecutive tool errors. My previous approach is wrong. I will now call the tool with a fundamentally different, corrected approach:

$ cat E:/llama/internal/api/orchestrator.go | grep -n "error_message" (timeout 5s)

661: "error_message": "",

Took 0.1s

⚠️ 5 consecutive tool errors. My previous approach is wrong. I will now call the tool with a fundamentally different, corrected approach:

$ find E:/llama/internal/api -name "*.go" -exec grep -l "error_message" {} ; (timeout 5s)

E:/llama/internal/api/orchestrator.go

Took 0.4s

Good, the error_message is only set in one place. Now let me build and test the final fix.

$ cd E:/llama && GOPROXY=https://goproxy.cn,direct go build -o bin/master-server.exe ./cmd/master-server 2>&1
echo "Build: $?" (timeout 15s)

Build: 0

Took 1.6s

Build successful. Now let me restart the master server and test that error_message is cleared when starting a deployment.

I have done more fixed based on user reports after v16-test1 into v16. I have tested it extensively. manually and through extended test suite, and so far it works very well.

froggeric changed discussion status to closed

Sign up or log in to comment