Ministral-3-3B-Reasoning-2512-heretic-GGUF

GGUF quantizations of coder3101/Ministral-3-3B-Reasoning-2512-heretic for use with llama.cpp and compatible tools.

Model Description

This is a fine-tuned version of Mistral's Ministral-3-3B-Reasoning-2512 vision-language model. It supports:

Text generation with reasoning capabilities (uses [THINK] tokens)
Vision/Image understanding (requires the mmproj file)
Tool/Function calling

Available Quantizations

Quantization	Size	Description
BF16	6.4 GB	Full precision (bfloat16)
Q8_0	3.4 GB	8-bit quantization
Q5_K_M	2.3 GB	5-bit K-quant (medium)
Q4_K_M	2.0 GB	4-bit K-quant (medium) - Recommended

Vision Support

For vision/image understanding, you need to download the mmproj (multimodal projector) file:

Ministral-3-3B-Reasoning-2512-heretic-mmproj-bf16.gguf (811 MB)

Chat Template

The model includes a custom chat template with reasoning support. The format uses:

[SYSTEM_PROMPT]...[/SYSTEM_PROMPT] - System message
[INST]...[/INST] - User messages
[THINK]...[/THINK] - Model's reasoning/thinking process
[IMG] - Image placeholder for vision inputs
[TOOL_CALLS] and [TOOL_RESULTS] - For function calling

Example conversation:

[SYSTEM_PROMPT]You are a helpful assistant.[/SYSTEM_PROMPT][INST]What is 2+2?[/INST][THINK]The user is asking for a simple arithmetic calculation. 2+2=4.[/THINK]The answer is 4.

Usage

Text-only (CLI)

llama-cli -m Ministral-3-3B-Reasoning-2512-heretic-Q4_K_M.gguf \
  -p "[INST]What is the capital of France?[/INST]" \
  -n 256

With Vision Support

llama-mtmd-cli \
  -m Ministral-3-3B-Reasoning-2512-heretic-Q4_K_M.gguf \
  --mmproj Ministral-3-3B-Reasoning-2512-heretic-mmproj-bf16.gguf \
  -p "Describe this image in detail." \
  --image /path/to/image.jpg

With llama-server (OpenAI-compatible API)

llama-server \
  -m Ministral-3-3B-Reasoning-2512-heretic-Q4_K_M.gguf \
  --mmproj Ministral-3-3B-Reasoning-2512-heretic-mmproj-bf16.gguf \
  --port 8080

Then query the API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "ministral", "messages": [{"role": "user", "content": "What is 2+2?"}]}'