This is a Ministral-3-8B-Instruct-2512 fine-tune, produced through P-E-W's Heretic (v1.2.0) abliteration engine with Magnitude-Preserving Orthogonal Ablation enabled.

Note: Results from previous attempts: Click Here

Heretication Results

Score Metric	Value	Parameter	Value
Refusals	8/100	direction_index	per layer
KL Divergence	0.0509	attn.o_proj.max_weight	1.97
Initial Refusals	91/100	attn.o_proj.max_weight_position	17.48
		attn.o_proj.min_weight	1.90
		attn.o_proj.min_weight_distance	10.79
		mlp.down_proj.max_weight	0.19
		mlp.down_proj.max_weight_position	8.56
		mlp.down_proj.min_weight	0.04
		mlp.down_proj.min_weight_distance	15.62

Appendix

 » [Trial 407] Refusals:  8/100, KL divergence: 0.0509
   [Trial 318] Refusals: 11/100, KL divergence: 0.0314
   [Trial 253] Refusals: 14/100, KL divergence: 0.0278
   [Trial 216] Refusals: 15/100, KL divergence: 0.0276
   [Trial 401] Refusals: 19/100, KL divergence: 0.0255
   [Trial 405] Refusals: 21/100, KL divergence: 0.0240
   [Trial 149] Refusals: 31/100, KL divergence: 0.0232
   [Trial 249] Refusals: 33/100, KL divergence: 0.0221
   [Trial 244] Refusals: 38/100, KL divergence: 0.0214
   [Trial 230] Refusals: 44/100, KL divergence: 0.0207
   [Trial 153] Refusals: 46/100, KL divergence: 0.0198
   [Trial 347] Refusals: 52/100, KL divergence: 0.0175
   [Trial 154] Refusals: 62/100, KL divergence: 0.0160
   [Trial 138] Refusals: 64/100, KL divergence: 0.0154
   [Trial 392] Refusals: 65/100, KL divergence: 0.0134
   [Trial 480] Refusals: 66/100, KL divergence: 0.0120
   [Trial  29] Refusals: 73/100, KL divergence: 0.0113
   [Trial 240] Refusals: 74/100, KL divergence: 0.0109
   [Trial 612] Refusals: 75/100, KL divergence: 0.0102
   [Trial 255] Refusals: 77/100, KL divergence: 0.0073
   [Trial 378] Refusals: 79/100, KL divergence: 0.0059
   [Trial 605] Refusals: 81/100, KL divergence: 0.0046
   [Trial   1] Refusals: 82/100, KL divergence: 0.0042
   [Trial 443] Refusals: 83/100, KL divergence: 0.0040
   [Trial 486] Refusals: 84/100, KL divergence: 0.0038
   [Trial 450] Refusals: 85/100, KL divergence: 0.0026
   [Trial 343] Refusals: 86/100, KL divergence: 0.0022
   [Trial  14] Refusals: 87/100, KL divergence: 0.0009
   [Trial 336] Refusals: 88/100, KL divergence: 0.0008
   [Trial 274] Refusals: 89/100, KL divergence: 0.0005
   [Trial 418] Refusals: 90/100, KL divergence: 0.0004
   [Trial 688] Refusals: 91/100, KL divergence: 0.0000

Ministral 3 8B Instruct 2512 BF16

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.

We provide a no-loss FP8 version here, you can find other formats and quantizations in the Ministral 3 - Additional Checkpoints collection.

Learn more in our blog post and paper.

Key Features

Ministral 3 8B consists of two main architectural components:

8.4B Language Model
0.4B Vision Encoder

The Ministral 3 8B Instruct model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
System Prompt: Maintains strong adherence and support for system prompts.
Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.

Chat interfaces in constrained environments
Local daily-driver AI assistant
Image/document description and understanding
Translation and content generation
Specialized agentic use cases
Fine-tuning and specialization
And more...

Bringing advanced AI capabilities to resource-constrained environments.

Ministral 3 Family

Model Name	Type	Precision	Link
Ministral 3 3B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 3B Instruct 2512	Instruct post-trained	BF16	Hugging Face
Ministral 3 3B Reasoning 2512	Reasoning capable	BF16	Hugging Face
Ministral 3 8B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 8B Instruct 2512	Instruct post-trained	BF16	Hugging Face
Ministral 3 8B Reasoning 2512	Reasoning capable	BF16	Hugging Face
Ministral 3 14B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 14B Instruct 2512	Instruct post-trained	BF16	Hugging Face
Ministral 3 14B Reasoning 2512	Reasoning capable	BF16	Hugging Face

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

Model	AIME25	AIME24	GPQA Diamond	LiveCodeBench
Ministral 3 14B	0.850	0.898	0.712	0.646
Qwen3-14B (Thinking)	0.737	0.837	0.663	0.593

Ministral 3 8B	0.787	0.860	0.668	0.616
Qwen3-VL-8B-Thinking	0.798	0.860	0.671	0.580

Ministral 3 3B	0.721	0.775	0.534	0.548
Qwen3-VL-4B-Thinking	0.697	0.729	0.601	0.513

Instruct

Model	Arena Hard	WildBench	MATH Maj@1	MM MTBench
Ministral 3 14B	0.551	68.5	0.904	8.49
Qwen3 14B (Non-Thinking)	0.427	65.1	0.870	NOT MULTIMODAL
Gemma3-12B-Instruct	0.436	63.2	0.854	6.70

Ministral 3 8B	0.509	66.8	0.876	8.08
Qwen3-VL-8B-Instruct	0.528	66.3	0.946	8.00

Ministral 3 3B	0.305	56.8	0.830	7.83
Qwen3-VL-4B-Instruct	0.438	56.8	0.900	8.01
Qwen3-VL-2B-Instruct	0.163	42.2	0.786	6.36
Gemma3-4B-Instruct	0.318	49.1	0.759	5.23

Base

Model	Multilingual MMLU	MATH CoT 2-Shot	AGIEval 5-shot	MMLU Redux 5-shot	MMLU 5-shot	TriviaQA 5-shot
Ministral 3 14B	0.742	0.676	0.648	0.820	0.794	0.749
Qwen3 14B Base	0.754	0.620	0.661	0.837	0.804	0.703
Gemma 3 12B Base	0.690	0.487	0.587	0.766	0.745	0.788

Ministral 3 8B	0.706	0.626	0.591	0.793	0.761	0.681
Qwen 3 8B Base	0.700	0.576	0.596	0.794	0.760	0.639

Ministral 3 3B	0.652	0.601	0.511	0.735	0.707	0.592
Qwen 3 4B Base	0.677	0.405	0.570	0.759	0.713	0.530
Gemma 3 4B Base	0.516	0.294	0.430	0.626	0.589	0.640

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.