Meta-CoT / README.md
nielsr's picture
nielsr HF Staff
Add model card for Meta-CoT
72d6173 verified
|
raw
history blame
2.66 kB
---
pipeline_tag: image-to-image
---
# Meta-CoT: Enhancing Granularity and Generalization in Image Editing
[Meta-CoT](https://shiyi-zh0408.github.io/projectpages/Meta-CoT/) is a two-level Chain-of-Thought (CoT) decomposition paradigm for image editing. It decomposes editing intentions into a *(task, target, understanding ability)* triplet and further breaks down tasks into five fundamental meta-tasks, enabling strong generalization across 21+ editing operations.
[**Project Page**](https://shiyi-zh0408.github.io/projectpages/Meta-CoT/) | [**Paper**](https://huggingface.co/papers/2604.24625) | [**Code**](https://github.com/shiyi-zh0408/Meta-CoT)
## Overview
Meta-CoT addresses the challenge of understanding granularity and generalization in image editing through:
- **Triplet Decomposition**: Decomposes any editing intention into a (task, target, required understanding ability) triplet. This helps the model learn specific elements of an operation during training.
- **Meta-task Generalization**: Breaks down complex editing tasks into five fundamental meta-tasks: Addition, Deletion, Replacement, Camera Motion, and Position Change. Training on these meta-tasks allows the model to generalize to unseen, diverse editing scenarios.
- **CoT-Editing Consistency (CEC) Reward**: A VLM-based reward mechanism integrated into a Flow-GRPO framework that ensures the model's output aligns accurately with its reasoning process.
## Usage
To use Meta-CoT for single-image editing, first set up the environment as described in the [GitHub repository](https://github.com/shiyi-zh0408/Meta-CoT). You can then run the following inference script:
```bash
python inference/edit_single.py --image <your-image-path> --instruction <editing-instruction>
```
### Key Inference Parameters
| Parameter | Description | Typical Range |
|-----------|-------------|---------------|
| `cfg_text_scale` | Text prompt guidance strength | 4.0 - 8.0 |
| `cfg_image_scale` | Input image preservation strength | 1.0 - 2.0 |
| `num_timesteps` | Total denoising steps | 50 |
## Citation
If you find Meta-CoT useful in your research, please cite:
```bibtex
@article{zhang2026metacot,
title={Meta-CoT: Enhancing Granularity and Generalization in Image Editing},
author={Zhang, Shiyi and Cheng, Yiji and Hang, Tiankai and Yin, Zijin and He, Runze and Xu, Yu and Dai, Wenxun and Lin, Yunlong and Wang, Chunyu and Lu, Qinglin and Tang, Yansong},
journal={arXiv preprint arXiv:2604.24625},
year={2026}
}
```
## Acknowledgments
The code for Meta-CoT is built upon [Bagel](https://github.com/ByteDance-Seed/Bagel). We thank the authors for their open-source contributions.