REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Paper • 2510.13999 • Published • 19
GGUF quantized versions of 0xSero/gemma-4-21b-a4b-it-REAP.
This is 20% expert-pruned version of Google's Gemma-4 26B-A4B-it using Cerebras REAP (Router-weighted Expert Activation Pruning).
| Metric | Original (26B) | This Model (21B) |
|---|---|---|
| Total params | ~26B | 21.34B |
| Experts/layer | 128 | 103 |
| Active params/token | ~4B | ~4B |
| Disk size | ~52GB | ~43GB |
REAP removes 20% of MoE experts (25 of 128 per layer) while preserving the model's routing behavior. The active parameter count per token is unchanged since the router still selects 8 experts per token from the remaining pool.
<|channel>thought / <|channel>response channels| Filename | Quant Type | Size | Description |
|---|---|---|---|
gemma-4-21b-a4b-it-REAP.gguf |
BF16 | ~43GB | Full precision, best quality |
gemma-4-21b-a4b-it-REAP-Q8_0.gguf |
Q8_0 | ~23GB | High quality |
gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf |
Q5_K_M | ~15GB | Balanced (recommended) |
gemma-4-21b-a4b-it-REAP-Q4_K_M.gguf |
Q4_K_M | ~13GB | Good quality, smaller |
gemma-4-21b-a4b-it-REAP-Q3_K_M.gguf |
Q3_K_M | ~10GB | Smallest |
# Download a quantized model
wget https://huggingface.co/Ayodele01/gemma-4-21b-a4b-it-REAP-GGUF/resolve/main/gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf
# Run with llama.cpp
./llama-cli -m gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf \
-p "Write a quicksort in Python." \
-n 2048
Create a Modelfile:
FROM ./gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf
TEMPLATE """<bos><start_of_turn>user
{{ .Prompt }}<end_of_turn>
<start_of_turn>model
"""
PARAMETER stop "<end_of_turn>"
PARAMETER temperature 0.7
Then:
ollama create gemma4-21b-reap -f Modelfile
ollama run gemma4-21b-reap
| Task | Original (26B) | REAP 21B |
|---|---|---|
| Elementary Math | 92% | 90% |
| Philosophy | 92% | 88% |
| GSM8K | 86% | 84% |
Generation quality is "essentially indistinguishable from the original" according to the REAP authors.
This model is released under the Gemma License.
Base model
0xSero/gemma-4-21b-a4b-it-REAP