codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4

Discussion about vLLM tool call parser

by yuzu127000d - opened 30 days ago

First of all, thank you for sharing this high-quality model! It runs extremely smoothly on my hardware.

However, I've noticed that compared to the original models, which work fine with the qwen3_coder parser in vLLM, this model seems to behave abnormally under the same parser. When using it alongside OpenCode, it frequently fails to correctly recognize tool calls, causing the session to abruptly terminate.

I've experimented with several other parsers with the following results:

qwen3_xml: Fails to parse, causing OpenCode to throw an error directly.

hermes: The most functional option so far. It can successfully recognize tool calls, but it frequently encounters JSON escaping issues (e.g., unescaped characters) when executing file-writing tools.

llama3_json: Fails to recognize the tool calls properly.

I'd like to ask the community: given the changes from the distillation process, how is everyone currently configuring this model to successfully integrate with various agent systems?

Some environment info:
vLLM: 0.17.2rc1.dev201+g0d50fa1db
OpenCode: 1.3.0 without further config except basic sampling params below

build": {
"temperature": 0.7,
"topP": 0.8,
"topK": 20,
"presencePenalty": 1.5,
"frequencyPenalty": 0
}

tasticleeze

28 days ago

I use Hermes but it’s definitely not perfect in my open code either

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment