Bundled chat_template.jinja is chat-only — strips tools silently
Heads up that the chat_template.jinja shipped in this repo (and across the V4-Flash quant variants) only renders system/user/assistant messages — there's no branch for the tool role, no iteration over the tools array, and no <tool_call> markers. So when an OpenAI-compatible client passes tools=[...], the array is silently dropped by apply_chat_template and the model never knows tools were available.
We picked this up while shipping day-0 V4 support in rapid-mlx (Apple Silicon MLX backend, PR #168). Plain chat works perfectly on both 2-bit DQ and 8-bit on a Mac Studio M3 Ultra (56/31 tok/s decode respectively, 7/8 stress scenarios pass), but our 30-scenario tool-calling eval scored 0/30 — every scenario logs tool_detected: False. Same outcome with Hermes and OpenClaude agent profiles.
Not a quant issue (identical 0/30 on 2-bit and 8-bit) and not a parser issue — the model literally never sees the tools list. Verified by inspecting the rendered prompt.
There's an active PR #16 upstream on deepseek-ai/DeepSeek-V4-Flash (by @Rocketknight1 , HF staff) adding a tool-supporting template, with a follow-up alternative @qgallouedec proposed. Would it be possible to pull whichever variant lands into the V4-Flash quant repos so users get tool calling out of the box?
Happy to test + report numbers once an updated template lands.
Thanks for the great quant work — the model itself runs beautifully on Apple Silicon.
I have tested the version shared by @Rocketknight1 in that PR, and it passes the same tool calling tests as the custom encoding code.