Frontier efficiency Average score (IFEval, GSM8K, HumanEval+, BFCL, MuSR, MMLU-Redux) 40 50 60 70 80 0.25 GB 0.5 GB 1 GB 2 GB 4 GB 8 GB 16 GB Model size in GB (log scale) Average benchmark score Bonsai 1.7B Bonsai 4B Bonsai 8B Qwen3 0.6B Qwen3 1.7B Ministral3 3B Qwen3 4B Qwen3 8B