SamsungSAILMontreal
/

Qwen3-30B-A3B-Instruct-2507-REAP

Text Generation

Mixture of Experts

Model card Files Files and versions

Qwen3-30B-A3B-Instruct-2507-REAP

This model is a compressed version of Qwen/Qwen3-30B-A3B-Instruct-2507. It is obtained by reducing the number of experts in each MoE layer from 128 to 96 using the REAP baseline method as described in https://bknyaz.github.io/blog/2026/moe/. The compressed model has 23B params (44GB) instead of 31B (57GB) of the original model, reducing storage and GPU memory requirements by roughly 25%. At the same time, the model retains >=90% of the original model's performance on a variety of benchmarks (see Results section below). Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.

See additional details at Qwen3-30B-A3B-Instruct-2507-REAM.

Results

Model	Winogrande	ARC-C	ARC-E	BoolQ	HellaSwag	MMLU	OpenBookQA	RTE	AVG
Qwen3-30B-A3B-Instruct-2507	73.2	60.7	85.1	88.7	61.2	80.1	32.4	76.5	69.7
Qwen3-30B-A3B-Instruct-2507-REAP	71.7	49.3	77.4	88.1	56.5	69.3	29.6	78.3	65.0

Model	IFeval	AIME25	GSM8K	GPQA-D	HumanEval	LiveCodeBench	AVG
Qwen3-30B-A3B-Instruct-2507	90.4	56.7	89.3	47.0	93.3	48.6	70.9
Qwen3-30B-A3B-Instruct-2507-REAP	88.0	56.7	87.9	37.9	81.7	33.0	64.2

License

Please refer to the license of the original model Qwen/Qwen3-30B-A3B-Instruct-2507.

Downloads last month: 168

Safetensors

Model size

23B params

Tensor type

BF16

·

Model tree for SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAP

Base model

Qwen/Qwen3-30B-A3B-Instruct-2507

Finetuned

(72)

this model

Quantizations

Collection including SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAP

REAM

Compressed MoE models with a reduced number of experts. See additional models at https://huggingface.co/bknyaz. • 12 items • Updated 1 day ago • 6