Qwen3-30B-A3B-Instruct-2507-REAP

This model is a compressed version of Qwen/Qwen3-30B-A3B-Instruct-2507. It is obtained by reducing the number of experts in each MoE layer from 128 to 96 using the REAP baseline method as described in https://bknyaz.github.io/blog/2026/moe/. The compressed model has 23B params (44GB) instead of 31B (57GB) of the original model, reducing storage and GPU memory requirements by roughly 25%. At the same time, the model retains >=90% of the original model's performance on a variety of benchmarks (see Results section below). Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.

See additional details at Qwen3-30B-A3B-Instruct-2507-REAM.

Results

Model Winogrande ARC-C ARC-E BoolQ HellaSwag MMLU OpenBookQA RTE AVG
Qwen3-30B-A3B-Instruct-2507 73.2 60.7 85.1 88.7 61.2 80.1 32.4 76.5 69.7
Qwen3-30B-A3B-Instruct-2507-REAP 71.7 49.3 77.4 88.1 56.5 69.3 29.6 78.3 65.0
Model IFeval AIME25 GSM8K GPQA-D HumanEval LiveCodeBench AVG
Qwen3-30B-A3B-Instruct-2507 90.4 56.7 89.3 47.0 93.3 48.6 70.9
Qwen3-30B-A3B-Instruct-2507-REAP 88.0 56.7 87.9 37.9 81.7 33.0 64.2

License

Please refer to the license of the original model Qwen/Qwen3-30B-A3B-Instruct-2507.

Downloads last month
168
Safetensors
Model size
23B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAP

Finetuned
(72)
this model
Quantizations
2 models

Collection including SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAP