Ministral 3 Reasoning Heretic
Collection
7 items • Updated • 2
GGUF quantizations of coder3101/Ministral-3-8B-Reasoning-2512-heretic for use with llama.cpp and compatible tools.
This is a fine-tuned version of Mistral's Ministral-3-8B-Reasoning-2512 vision-language model. It supports:
[THINK] tokens)| Quantization | Size | Description |
|---|---|---|
| BF16 | 16 GB | Full precision (bfloat16) |
| Q8_0 | 8.5 GB | 8-bit quantization |
| Q5_K_M | 5.7 GB | 5-bit K-quant (medium) |
| Q4_K_M | 4.9 GB | 4-bit K-quant (medium) - Recommended |
For vision/image understanding, you need to download the mmproj (multimodal projector) file:
Ministral-3-8B-Reasoning-2512-heretic-mmproj-bf16.gguf (827 MB)The model includes a custom chat template with reasoning support. The format uses:
[SYSTEM_PROMPT]...[/SYSTEM_PROMPT] - System message[INST]...[/INST] - User messages [THINK]...[/THINK] - Model's reasoning/thinking process[IMG] - Image placeholder for vision inputs[TOOL_CALLS] and [TOOL_RESULTS] - For function callingExample conversation:
[SYSTEM_PROMPT]You are a helpful assistant.[/SYSTEM_PROMPT][INST]What is 2+2?[/INST][THINK]The user is asking for a simple arithmetic calculation. 2+2=4.[/THINK]The answer is 4.
llama-cli -m Ministral-3-8B-Reasoning-2512-heretic-Q4_K_M.gguf \
-p "[INST]What is the capital of France?[/INST]" \
-n 256
llama-mtmd-cli \
-m Ministral-3-8B-Reasoning-2512-heretic-Q4_K_M.gguf \
--mmproj Ministral-3-8B-Reasoning-2512-heretic-mmproj-bf16.gguf \
-p "Describe this image in detail." \
--image /path/to/image.jpg
llama-server \
-m Ministral-3-8B-Reasoning-2512-heretic-Q4_K_M.gguf \
--mmproj Ministral-3-8B-Reasoning-2512-heretic-mmproj-bf16.gguf \
--port 8080
Then query the API:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ministral", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
This GGUF is based on:
Apache 2.0
4-bit
5-bit
8-bit
16-bit
Base model
mistralai/Ministral-3-8B-Base-2512