Bundled chat_template.jinja is chat-only — strips tools silently

#1
by Raullen - opened
MLX Community org

Heads up that the chat_template.jinja shipped in this repo (and across the V4-Flash quant variants) only renders system/user/assistant messages — there's no branch for the tool role, no iteration over the tools array, and no <tool_call> markers. So when an OpenAI-compatible client passes tools=[...], the array is silently dropped by apply_chat_template and the model never knows tools were available.

We picked this up while shipping day-0 V4 support in rapid-mlx (Apple Silicon MLX backend, PR #168). Plain chat works perfectly on both 2-bit DQ and 8-bit on a Mac Studio M3 Ultra (56/31 tok/s decode respectively, 7/8 stress scenarios pass), but our 30-scenario tool-calling eval scored 0/30 — every scenario logs tool_detected: False. Same outcome with Hermes and OpenClaude agent profiles.

Not a quant issue (identical 0/30 on 2-bit and 8-bit) and not a parser issue — the model literally never sees the tools list. Verified by inspecting the rendered prompt.

There's an active PR #16 upstream on deepseek-ai/DeepSeek-V4-Flash (by @Rocketknight1 , HF staff) adding a tool-supporting template, with a follow-up alternative @qgallouedec proposed. Would it be possible to pull whichever variant lands into the V4-Flash quant repos so users get tool calling out of the box?

Happy to test + report numbers once an updated template lands.

Thanks for the great quant work — the model itself runs beautifully on Apple Silicon.

MLX Community org
edited 7 days ago

I have tested the version shared by @Rocketknight1 in that PR, and it passes the same tool calling tests as the custom encoding code.

Sign up or log in to comment