---
base_model:
- ibm-granite/granite-4.1-3b
license: apache-2.0
library_name: transformers
tags:
- language
- unsloth
- granite-4.1
---
> [!NOTE]
> Includes Unsloth **chat template fixes**!
For `llama.cpp`, use `--jinja`
>
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.
| Benchmarks | Metric | 3B Dense | 8B Dense | 30B Dense | |
|---|---|---|---|---|---|
| General Tasks | |||||
| MMLU | 5-shot | 67.02 | 73.84 | 80.16 | |
| MMLU-Pro | 5-shot, CoT | 49.83 | 55.99 | 64.09 | |
| BBH | 3-shot, CoT | 75.83 | 80.51 | 83.74 | |
| AGI EVAL | 0-shot, CoT | 65.16 | 72.43 | 77.80 | |
| GPQA | 0-shot, CoT | 31.70 | 41.96 | 45.76 | |
| SimpleQA | 3.68 | 4.82 | 6.81 | ||
| Alignment Tasks | |||||
| AlpacaEval 2.0 | 38.57 | 50.08 | 56.16 | ||
| IFEval Avg | 82.30 | 87.06 | 89.65 | ||
| ArenaHard | 37.80 | 68.98 | 71.02 | ||
| MTBench Avg | 7.57 | 8.61 | 8.61 | ||
| Math Tasks | |||||
| GSM8K | 8-shot | 86.88 | 92.49 | 94.16 | |
| GSM Symbolic | 8-shot | 81.32 | 83.70 | 75.70 | |
| Minerva Math | 0-shot, CoT | 67.94 | 80.10 | 81.32 | |
| DeepMind Math | 0-shot, CoT | 64.64 | 80.07 | 81.93 | |
| Code Tasks | |||||
| HumanEval | pass@1 | 81.71 | 85.37 | 88.41 | |
| HumanEval+ | pass@1 | 76.83 | 79.88 | 85.37 | |
| MBPP | pass@1 | 71.16 | 87.30 | 85.45 | |
| MBPP+ | pass@1 | 62.17 | 73.81 | 73.54 | |
| CRUXEval-O | pass@1 | 40.75 | 47.63 | 55.75 | |
| BigCodeBench | pass@1 | 32.19 | 35.00 | 38.77 | |
| MULTIPLE | pass@1 | 52.54 | 60.26 | 62.31 | |
| Eval+ Avg | pass@1 | 67.05 | 80.21 | 82.66 | |
| Tool Calling Tasks | |||||
| BFCL v3 | 60.80 | 68.27 | 73.68 | ||
| Multilingual Tasks | |||||
| MMMLU | 5-shot | 57.61 | 64.84 | 73.71 | |
| INCLUDE | 5-shot | 52.05 | 58.89 | 67.26 | |
| MGSM | 8-shot | 70.00 | 82.32 | 71.12 | |
| Safety | |||||
| SALAD-Bench | 93.95 | 95.80 | 96.41 | ||
| AttaQ | 81.88 | 81.19 | 85.76 | ||
| Tulu3 Safety Eval Avg | 66.84 | 75.57 | 78.19 | ||
| Benchmarks | # Langs | Languages |
|---|---|---|
| MMMLU | 11 | ar, de, en, es, fr, ja, ko, pt, zh, bn, hi |
| INCLUDE | 14 | hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh |
| MGSM | 5 | en, es, fr, ja, zh |
| Model | 3B Dense | 8B Dense | 30B Dense |
|---|---|---|---|
| Embedding size | 2560 | 4096 | 4096 |
| Number of layers | 40 | 40 | 64 |
| Attention head size | 64 | 128 | 128 |
| Number of attention heads | 40 | 32 | 32 |
| Number of KV heads | 8 | 8 | 8 |
| MLP / Shared expert hidden size | 8192 | 12800 | 32768 |
| MLP activation | SwiGLU | SwiGLU | SwiGLU |
| Sequence length | 131072 | 131072 | 131072 |
| Position embedding | RoPE | RoPE | RoPE |
| # Parameters | 3B | 8B | 30B |