Instructions to use Uni-Edit/Uni-Edit-BAGEL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Uni-Edit/Uni-Edit-BAGEL with Transformers:
# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("Uni-Edit/Uni-Edit-BAGEL", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| base_model: | |
| - ByteDance-Seed/BAGEL-7B-MoT | |
| datasets: | |
| - Uni-Edit/Train-Data | |
| library_name: transformers | |
| pipeline_tag: any-to-any | |
| license: apache-2.0 | |
| <p align="left"> | |
| <img src="https://github.com/zhengdian1/Uni-Edit/blob/main/assets/logo.jpg?raw=true" alt="Uni-Edit" width="480"/> | |
| </p> | |
| # 🥯 Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning | |
| [**Project Page**](https://zhengdian1.github.io/Uni-Edit-proj/) | [**GitHub Repository**](https://github.com/zhengdian1/Uni-Edit) | [**Paper**](https://arxiv.org/pdf/2605.21487) | |
| # 👀 Intro | |
| <div align="center"> | |
| <img src="https://github.com/zhengdian1/Uni-Edit/blob/main/assets/teaser.webp?raw=true" alt="Uni-Edit Teaser" width="80%"> | |
| </div> | |
| We introduce **Uni-Edit**, an intelligent image editing task that serves as the **first general task for Unified Multimodal Model (UMM) tuning**. Unlike conventional mixed multi-task training that suffers from inherent task conflicts and requires complex multi-stage pipelines, Uni-Edit breaks this paradigm. It achieves true mutual reinforcement by **improving image understanding, generation, and editing capabilities simultaneously using only one task, one training stage, and one dataset.** | |
| To overcome the limitations of simplistic existing editing data, we propose the **first automated and scalable data synthesis pipeline** for intelligent editing. By transforming diverse VQA data into complex instructions with embedded questions and nested logic, we build **Uni-Edit-148k**, a dedicated dataset pairing reasoning-intensive instructions with high-quality edited images. | |
| Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves **comprehensive enhancements across all three multimodal capabilities** without requiring any massive data mixing, balancing tricks, or auxiliary operations. | |
| ## 🎥 Demo | |
| Refer to our website [[🌐Project Page]](https://zhengdian1.github.io/Uni-Edit-proj/) | |
| ## 🚀 Training and Inference | |
| For detailed instructions on setup, training, inference, evaluation, data construction, please refer to the [official GitHub repository](https://github.com/zhengdian1/Uni-Edit). | |
| **⚠️ IMPORTANT: Custom Architecture** | |
| Because this is a custom architecture, you **CANNOT** load it directly via `AutoModel.from_pretrained()`. To run the provided inference code, you **MUST** physically merge these shards into a single `ema.safetensors` file on your local machine. | |
| Run the Python script in the [code](https://github.com/zhengdian1/Uni-Edit/merge.py) where you downloaded the repository. | |
| *(Note: You need at least 54GB of free system RAM to perform this merge).* | |
| ## 📐 Citation | |
| If you find our work helpful for your research, please consider citing our work: | |
| ```bibtex | |
| @article{zheng2026uniedit, | |
| title = {Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning}, | |
| author = {Zheng, Dian and Zhang, Manyuan and Li, Hongyu and Liu, Hongbo and Zou, Kai and Feng, Kaituo and Li, Hongsheng}, | |
| journal = {arXiv preprint arXiv:2605.21487}, | |
| year = {2026} | |
| } | |
| ``` |