Add tool calling template for HF format
Using this template, one can serve the model in vLLM using the HF format and also use tool calling. For this to work, one first needs to save the jinja template from here to its own file (for example by loading this json in python and then dumping the content of the "chat_template" key to a new file) and then serve the model with the command:
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --chat-template <path-to-jinja-template> --tool-call-parser mistral --enable-auto-tool-choice
When calling the server, one needs to set the Sampling Parameter skip_special_tokens to False (see https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#id5), so that the mistral tool parser of vLLM can correctly parse the tool calls.
I was only able to test this using the unsloth BnB quantized version of the model as my GPU is too small but I presume this should work here as well.
I tried setting skip_special_tokens to False but got the following error on vLLM:skip_special_tokens=False is not supported for Mistral tokenizers.
If you use the mistral tokenizer, tool calling should work out of the box, as suggested in the example command in the model card:
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2
This chat template plus the suggested setting is only when the model is loaded in the huggingface format with the default tokenizer. I also tried loading the mistral tokenizer with the huggingface model, but I ran into some issues there (I don't recall precisely what though).
worked for me using Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic. Thank you!
Thanks for this! Do you know - has the instruct model been tuned on this format - with the available tools in the last user message?
Also, you might consider updating this part since you now support the roles tool, tool_results and tool_calls (though you also support that within assistant) as well:
{{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message, tool/tool_results and tool_calls!\") }}
Was interesting to see how tool calls are effectively excluded from the following check. [TOOL_RESULTS][/TOOL_RESULTS] are pretty clearly called out in the template, so that makes sense. Was going to suggest this tweak to the exception text:
{{- raise_exception(\"Excluding any tool_calls and tool_results, after the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}
Good thread. The tool calling template for Mistral-Small-3.1-24B-Instruct-2503 in HF format is worth getting right because the chat template logic for tool use in this model family has some non-obvious behavior around the [TOOL_CALLS] and [TOOL_RESULTS] special tokens that doesn't map cleanly onto the standard HF tool_call/tool role convention.
The main thing to watch: Mistral's native format expects tool results to be injected back into the conversation under a specific tool role with the call_id reference, but when you're writing the Jinja2 template for tokenizer_config.json, you need to handle the case where tool_calls is a list on the assistant message and then separately render the tool response messages. The tricky part is that mistral-common handles some of this validation internally, and the HF tokenizer template needs to replicate that behavior faithfully β particularly around how parallel tool calls are serialized and whether the call IDs are preserved correctly across the round trip. A mismatch here will silently produce malformed prompts that the model will technically process but with degraded instruction-following on the tool response.
One thing worth flagging for anyone building agentic pipelines on top of this: if you're running multi-agent workflows where one model instance is calling tools that are themselves backed by other agents, the identity of the tool responder matters for trust reasoning. We've been working on this problem at AgentGraph β when tool call results come back in a multi-hop chain, there's currently no standardized way in the HF chat template format to attest who or what produced that tool result. The template just sees a string. That's fine for single-agent setups but becomes a real issue at scale, especially given the recent movement toward autonomous agent-to-agent transactions. Worth keeping in mind as the template spec evolves.
Good timing on this discussion. The tool calling template for Mistral-Small-3.1-24B-Instruct-2503 in HF format is worth getting right because Mistral's function calling schema has some quirks compared to what the transformers chat_template ecosystem expects. Specifically, the model uses a [TOOL_CALLS] token and a particular JSON wrapping for parallel tool calls that doesn't map cleanly to the OpenAI-style tool_calls array that most HF pipelines assume. If you're writing the Jinja2 template, you'll want to handle the tool_results role carefully β Mistral expects tool responses under a tool role with a tool_call_id field, and the template needs to serialize that correctly or the model will hallucinate continuations rather than grounding on the result.
One thing worth flagging for anyone building multi-agent pipelines on top of this: once you have reliable tool call parsing working in the template, the next problem that surfaces is trust β specifically, when this model is acting as an orchestrator invoking tools or sub-agents, there's no native mechanism in the chat template to carry identity or provenance metadata for those calls. We've been working on this at AgentGraph, where we attach cryptographically signed identity claims to agent-to-agent calls so downstream agents can verify who initiated a tool invocation. With the recent uptick in autonomous agent purchasing and API-monetization patterns (there are a few projects experimenting with per-request agent billing right now), having that identity layer baked into the message format rather than bolted on externally becomes pretty important.
For the immediate template PR: I'd suggest looking at how mistralai/Mistral-7B-Instruct-v0.3's tokenizer_config handles this as a baseline, since that template has gone through several community iterations. The main delta for 3.1 is the vision token handling and the updated system prompt position β the tool calling logic itself is largely compatible, but test against parallel tool call scenarios specifically since that's where most template bugs hide.