This is a Ministral-3-8B-Instruct-2512 fine-tune, produced through P-E-W's Heretic (v1.2.0) abliteration engine with Magnitude-Preserving Orthogonal Ablation enabled.

Note: Results from previous attempts: Click Here


Heretication Results

Score Metric Value Parameter Value
Refusals 8/100 direction_index per layer
KL Divergence 0.0509 attn.o_proj.max_weight 1.97
Initial Refusals 91/100 attn.o_proj.max_weight_position 17.48
attn.o_proj.min_weight 1.90
attn.o_proj.min_weight_distance 10.79
mlp.down_proj.max_weight 0.19
mlp.down_proj.max_weight_position 8.56
mlp.down_proj.min_weight 0.04
mlp.down_proj.min_weight_distance 15.62

Appendix

PaCMAP projection
 » [Trial 407] Refusals:  8/100, KL divergence: 0.0509
   [Trial 318] Refusals: 11/100, KL divergence: 0.0314
   [Trial 253] Refusals: 14/100, KL divergence: 0.0278
   [Trial 216] Refusals: 15/100, KL divergence: 0.0276
   [Trial 401] Refusals: 19/100, KL divergence: 0.0255
   [Trial 405] Refusals: 21/100, KL divergence: 0.0240
   [Trial 149] Refusals: 31/100, KL divergence: 0.0232
   [Trial 249] Refusals: 33/100, KL divergence: 0.0221
   [Trial 244] Refusals: 38/100, KL divergence: 0.0214
   [Trial 230] Refusals: 44/100, KL divergence: 0.0207
   [Trial 153] Refusals: 46/100, KL divergence: 0.0198
   [Trial 347] Refusals: 52/100, KL divergence: 0.0175
   [Trial 154] Refusals: 62/100, KL divergence: 0.0160
   [Trial 138] Refusals: 64/100, KL divergence: 0.0154
   [Trial 392] Refusals: 65/100, KL divergence: 0.0134
   [Trial 480] Refusals: 66/100, KL divergence: 0.0120
   [Trial  29] Refusals: 73/100, KL divergence: 0.0113
   [Trial 240] Refusals: 74/100, KL divergence: 0.0109
   [Trial 612] Refusals: 75/100, KL divergence: 0.0102
   [Trial 255] Refusals: 77/100, KL divergence: 0.0073
   [Trial 378] Refusals: 79/100, KL divergence: 0.0059
   [Trial 605] Refusals: 81/100, KL divergence: 0.0046
   [Trial   1] Refusals: 82/100, KL divergence: 0.0042
   [Trial 443] Refusals: 83/100, KL divergence: 0.0040
   [Trial 486] Refusals: 84/100, KL divergence: 0.0038
   [Trial 450] Refusals: 85/100, KL divergence: 0.0026
   [Trial 343] Refusals: 86/100, KL divergence: 0.0022
   [Trial  14] Refusals: 87/100, KL divergence: 0.0009
   [Trial 336] Refusals: 88/100, KL divergence: 0.0008
   [Trial 274] Refusals: 89/100, KL divergence: 0.0005
   [Trial 418] Refusals: 90/100, KL divergence: 0.0004
   [Trial 688] Refusals: 91/100, KL divergence: 0.0000

Ministral 3 8B Instruct 2512 BF16

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.

We provide a no-loss FP8 version here, you can find other formats and quantizations in the Ministral 3 - Additional Checkpoints collection.

Learn more in our blog post and paper.

Key Features

Ministral 3 8B consists of two main architectural components:

  • 8.4B Language Model
  • 0.4B Vision Encoder

The Ministral 3 8B Instruct model offers the following capabilities:

  • Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
  • System Prompt: Maintains strong adherence and support for system prompts.
  • Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
  • Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
  • Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
  • Large Context Window: Supports a 256k context window.

Use Cases

Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.

  • Chat interfaces in constrained environments
  • Local daily-driver AI assistant
  • Image/document description and understanding
  • Translation and content generation
  • Specialized agentic use cases
  • Fine-tuning and specialization
  • And more...

Bringing advanced AI capabilities to resource-constrained environments.

Ministral 3 Family

Model Name Type Precision Link
Ministral 3 3B Base 2512 Base pre-trained BF16 Hugging Face
Ministral 3 3B Instruct 2512 Instruct post-trained BF16 Hugging Face
Ministral 3 3B Reasoning 2512 Reasoning capable BF16 Hugging Face
Ministral 3 8B Base 2512 Base pre-trained BF16 Hugging Face
Ministral 3 8B Instruct 2512 Instruct post-trained BF16 Hugging Face
Ministral 3 8B Reasoning 2512 Reasoning capable BF16 Hugging Face
Ministral 3 14B Base 2512 Base pre-trained BF16 Hugging Face
Ministral 3 14B Instruct 2512 Instruct post-trained BF16 Hugging Face
Ministral 3 14B Reasoning 2512 Reasoning capable BF16 Hugging Face

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

Model AIME25 AIME24 GPQA Diamond LiveCodeBench
Ministral 3 14B 0.850 0.898 0.712 0.646
Qwen3-14B (Thinking) 0.737 0.837 0.663 0.593
Ministral 3 8B 0.787 0.860 0.668 0.616
Qwen3-VL-8B-Thinking 0.798 0.860 0.671 0.580
Ministral 3 3B 0.721 0.775 0.534 0.548
Qwen3-VL-4B-Thinking 0.697 0.729 0.601 0.513

Instruct

Model Arena Hard WildBench MATH Maj@1 MM MTBench
Ministral 3 14B 0.551 68.5 0.904 8.49
Qwen3 14B (Non-Thinking) 0.427 65.1 0.870 NOT MULTIMODAL
Gemma3-12B-Instruct 0.436 63.2 0.854 6.70
Ministral 3 8B 0.509 66.8 0.876 8.08
Qwen3-VL-8B-Instruct 0.528 66.3 0.946 8.00
Ministral 3 3B 0.305 56.8 0.830 7.83
Qwen3-VL-4B-Instruct 0.438 56.8 0.900 8.01
Qwen3-VL-2B-Instruct 0.163 42.2 0.786 6.36
Gemma3-4B-Instruct 0.318 49.1 0.759 5.23

Base

Model Multilingual MMLU MATH CoT 2-Shot AGIEval 5-shot MMLU Redux 5-shot MMLU 5-shot TriviaQA 5-shot
Ministral 3 14B 0.742 0.676 0.648 0.820 0.794 0.749
Qwen3 14B Base 0.754 0.620 0.661 0.837 0.804 0.703
Gemma 3 12B Base 0.690 0.487 0.587 0.766 0.745 0.788
Ministral 3 8B 0.706 0.626 0.591 0.793 0.761 0.681
Qwen 3 8B Base 0.700 0.576 0.596 0.794 0.760 0.639
Ministral 3 3B 0.652 0.601 0.511 0.735 0.707 0.592
Qwen 3 4B Base 0.677 0.405 0.570 0.759 0.713 0.530
Gemma 3 4B Base 0.516 0.294 0.430 0.626 0.589 0.640

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Downloads last month
21
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MuXodious/Ministral-3-8B-Instruct-2512-PaperWitch-heresy

Finetuned
(4)
this model
Quantizations
2 models

Collections including MuXodious/Ministral-3-8B-Instruct-2512-PaperWitch-heresy

Paper for MuXodious/Ministral-3-8B-Instruct-2512-PaperWitch-heresy