Meta-CoT / README.md

nielsr HF Staff

Add model card for Meta-CoT

72d6173 verified 9 days ago

2.66 kB

pipeline_tag: image-to-image

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

Meta-CoT is a two-level Chain-of-Thought (CoT) decomposition paradigm for image editing. It decomposes editing intentions into a (task, target, understanding ability) triplet and further breaks down tasks into five fundamental meta-tasks, enabling strong generalization across 21+ editing operations.

Project Page | Paper | Code

Overview

Meta-CoT addresses the challenge of understanding granularity and generalization in image editing through:

Triplet Decomposition: Decomposes any editing intention into a (task, target, required understanding ability) triplet. This helps the model learn specific elements of an operation during training.
Meta-task Generalization: Breaks down complex editing tasks into five fundamental meta-tasks: Addition, Deletion, Replacement, Camera Motion, and Position Change. Training on these meta-tasks allows the model to generalize to unseen, diverse editing scenarios.
CoT-Editing Consistency (CEC) Reward: A VLM-based reward mechanism integrated into a Flow-GRPO framework that ensures the model's output aligns accurately with its reasoning process.

Usage

To use Meta-CoT for single-image editing, first set up the environment as described in the GitHub repository. You can then run the following inference script:

python inference/edit_single.py --image <your-image-path> --instruction <editing-instruction>

Key Inference Parameters

Parameter	Description	Typical Range
`cfg_text_scale`	Text prompt guidance strength	4.0 - 8.0
`cfg_image_scale`	Input image preservation strength	1.0 - 2.0
`num_timesteps`	Total denoising steps	50

Citation

If you find Meta-CoT useful in your research, please cite:

@article{zhang2026metacot,
  title={Meta-CoT: Enhancing Granularity and Generalization in Image Editing},
  author={Zhang, Shiyi and Cheng, Yiji and Hang, Tiankai and Yin, Zijin and He, Runze and Xu, Yu and Dai, Wenxun and Lin, Yunlong and Wang, Chunyu and Lu, Qinglin and Tang, Yansong},
  journal={arXiv preprint arXiv:2604.24625},
  year={2026}
}

Acknowledgments

The code for Meta-CoT is built upon Bagel. We thank the authors for their open-source contributions.