Discussion about vLLM tool call parser
First of all, thank you for sharing this high-quality model! It runs extremely smoothly on my hardware.
However, I've noticed that compared to the original models, which work fine with the qwen3_coder parser in vLLM, this model seems to behave abnormally under the same parser. When using it alongside OpenCode, it frequently fails to correctly recognize tool calls, causing the session to abruptly terminate.
I've experimented with several other parsers with the following results:
qwen3_xml: Fails to parse, causing OpenCode to throw an error directly.
hermes: The most functional option so far. It can successfully recognize tool calls, but it frequently encounters JSON escaping issues (e.g., unescaped characters) when executing file-writing tools.
llama3_json: Fails to recognize the tool calls properly.
I'd like to ask the community: given the changes from the distillation process, how is everyone currently configuring this model to successfully integrate with various agent systems?
Some environment info:
vLLM: 0.17.2rc1.dev201+g0d50fa1db
OpenCode: 1.3.0 without further config except basic sampling params below
build": {
"temperature": 0.7,
"topP": 0.8,
"topK": 20,
"presencePenalty": 1.5,
"frequencyPenalty": 0
}
I use Hermes but it’s definitely not perfect in my open code either