BAGEL-MoE-7B-GEN

This repository contains compressed variants of BAGEL-7B-MoT based on our paper:

Understanding and Harnessing Sparsity in Unified Multimodal Models [arXiv] · [GitHub]

We study sparsity in unified multimodal models that jointly handle image understanding and generation. Key findings:

Understanding components tolerate substantial compression with minimal quality loss.
Generation components are highly sensitive to pruning.
We propose MoE Adaptation: partition the generation module into multiple experts and activate them sparsely, recovering performance while reducing active parameters.

The compressed models reduce the active parameters in the generation module by half while maintaining comparable or even improved GenEval scores.

Model	Active Experts	Repo
BAGEL-MoE-7B-GEN-32to16	16 / 32	LLM-Drop/BAGEL-MoE-7B-GEN-32to16
BAGEL-MoE-7B-GEN-16to8	8 / 16	LLM-Drop/BAGEL-MoE-7B-GEN-16to8

GenEval Results

Model	SO	TO	CT	CL	POS	ATTR	ALL
BAGEL-7B-MoT (original)	0.99	0.94	0.81	0.95	0.72	0.77	0.86
BAGEL-MoE-7B-GEN-32to16	0.99	0.94	0.87	0.93	0.79	0.78	0.89
BAGEL-MoE-7B-GEN-16to8	1.00	0.92	0.82	1.00	0.77	0.83	0.89

Installation

conda create -n efficient_ug python=3.10
conda activate efficient_ug
pip install -r requirements.txt

Evaluation

# GenEval evaluation
bash scripts/eval/bagel/run_geneval_wr.sh

Citation

@article{he2025sparsity,
  title   = {Understanding and Harnessing Sparsity in Unified Multimodal Models},
  author  = {He, Shwai and others},
  journal = {arXiv preprint arXiv:2512.02351},
  year    = {2025}
}

Downloads last month: 33

Model tree for LLM-Drop/BAGEL-MoE-7B-GEN-32to16

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

ByteDance-Seed/BAGEL-7B-MoT

Finetuned

(28)

this model

Paper for LLM-Drop/BAGEL-MoE-7B-GEN-32to16

Understanding and Harnessing Sparsity in Unified Multimodal Models

Paper • 2512.02351 • Published Dec 2, 2025 • 4