Frontier efficiency
Average score (IFEval, GSM8K, HumanEval+, BFCL, MuSR, MMLU-Redux)
40
50
60
70
80
0.25 GB
0.5 GB
1 GB
2 GB
4 GB
8 GB
16 GB
Model size in GB (log scale)
Average benchmark score
Bonsai 1.7B
Bonsai 4B
Bonsai 8B
Qwen3 0.6B
Qwen3 1.7B
Ministral3 3B
Qwen3 4B
Qwen3 8B