Any-to-Any
Transformers
Safetensors
bagel
text-generation
Uni-Edit-BAGEL / README.md
zhengli1013's picture
Update README
304c4ad verified
---
base_model:
- ByteDance-Seed/BAGEL-7B-MoT
datasets:
- Uni-Edit/Train-Data
library_name: transformers
pipeline_tag: any-to-any
license: apache-2.0
---
<p align="left">
<img src="https://github.com/zhengdian1/Uni-Edit/blob/main/assets/logo.jpg?raw=true" alt="Uni-Edit" width="480"/>
</p>
# 🥯 Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
[**Project Page**](https://zhengdian1.github.io/Uni-Edit-proj/) | [**GitHub Repository**](https://github.com/zhengdian1/Uni-Edit) | [**Paper**](https://arxiv.org/pdf/2605.21487)
# 👀 Intro
<div align="center">
<img src="https://github.com/zhengdian1/Uni-Edit/blob/main/assets/teaser.webp?raw=true" alt="Uni-Edit Teaser" width="80%">
</div>
We introduce **Uni-Edit**, an intelligent image editing task that serves as the **first general task for Unified Multimodal Model (UMM) tuning**. Unlike conventional mixed multi-task training that suffers from inherent task conflicts and requires complex multi-stage pipelines, Uni-Edit breaks this paradigm. It achieves true mutual reinforcement by **improving image understanding, generation, and editing capabilities simultaneously using only one task, one training stage, and one dataset.**
To overcome the limitations of simplistic existing editing data, we propose the **first automated and scalable data synthesis pipeline** for intelligent editing. By transforming diverse VQA data into complex instructions with embedded questions and nested logic, we build **Uni-Edit-148k**, a dedicated dataset pairing reasoning-intensive instructions with high-quality edited images.
Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves **comprehensive enhancements across all three multimodal capabilities** without requiring any massive data mixing, balancing tricks, or auxiliary operations.
## 🎥 Demo
Refer to our website [[🌐Project Page]](https://zhengdian1.github.io/Uni-Edit-proj/)
## 🚀 Training and Inference
For detailed instructions on setup, training, inference, evaluation, data construction, please refer to the [official GitHub repository](https://github.com/zhengdian1/Uni-Edit).
**⚠️ IMPORTANT: Custom Architecture**
Because this is a custom architecture, you **CANNOT** load it directly via `AutoModel.from_pretrained()`. To run the provided inference code, you **MUST** physically merge these shards into a single `ema.safetensors` file on your local machine.
Run the Python script in the [code](https://github.com/zhengdian1/Uni-Edit/merge.py) where you downloaded the repository.
*(Note: You need at least 54GB of free system RAM to perform this merge).*
## 📐 Citation
If you find our work helpful for your research, please consider citing our work:
```bibtex
@article{zheng2026uniedit,
title = {Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning},
author = {Zheng, Dian and Zhang, Manyuan and Li, Hongyu and Liu, Hongbo and Zou, Kai and Feng, Kaituo and Li, Hongsheng},
journal = {arXiv preprint arXiv:2605.21487},
year = {2026}
}
```