Understanding and Harnessing Sparsity in Unified Multimodal Models
Paper • 2512.02351 • Published • 4
This repository contains compressed variants of BAGEL-7B-MoT based on our paper:
Understanding and Harnessing Sparsity in Unified Multimodal Models [arXiv] · [GitHub]
We study sparsity in unified multimodal models that jointly handle image understanding and generation. Key findings:
The compressed models reduce the active parameters in the generation module by half while maintaining comparable or even improved GenEval scores.
| Model | Active Experts | Repo |
|---|---|---|
| BAGEL-MoE-7B-GEN-32to16 | 16 / 32 | LLM-Drop/BAGEL-MoE-7B-GEN-32to16 |
| BAGEL-MoE-7B-GEN-16to8 | 8 / 16 | LLM-Drop/BAGEL-MoE-7B-GEN-16to8 |
| Model | SO | TO | CT | CL | POS | ATTR | ALL |
|---|---|---|---|---|---|---|---|
| BAGEL-7B-MoT (original) | 0.99 | 0.94 | 0.81 | 0.95 | 0.72 | 0.77 | 0.86 |
| BAGEL-MoE-7B-GEN-32to16 | 0.99 | 0.94 | 0.87 | 0.93 | 0.79 | 0.78 | 0.89 |
| BAGEL-MoE-7B-GEN-16to8 | 1.00 | 0.92 | 0.82 | 1.00 | 0.77 | 0.83 | 0.89 |
conda create -n efficient_ug python=3.10
conda activate efficient_ug
pip install -r requirements.txt
# GenEval evaluation
bash scripts/eval/bagel/run_geneval_wr.sh
@article{he2025sparsity,
title = {Understanding and Harnessing Sparsity in Unified Multimodal Models},
author = {He, Shwai and others},
journal = {arXiv preprint arXiv:2512.02351},
year = {2025}
}