Meta-CoT / README.md
nielsr's picture
nielsr HF Staff
Add model card for Meta-CoT
72d6173 verified
|
raw
history blame
2.66 kB
metadata
pipeline_tag: image-to-image

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

Meta-CoT is a two-level Chain-of-Thought (CoT) decomposition paradigm for image editing. It decomposes editing intentions into a (task, target, understanding ability) triplet and further breaks down tasks into five fundamental meta-tasks, enabling strong generalization across 21+ editing operations.

Project Page | Paper | Code

Overview

Meta-CoT addresses the challenge of understanding granularity and generalization in image editing through:

  • Triplet Decomposition: Decomposes any editing intention into a (task, target, required understanding ability) triplet. This helps the model learn specific elements of an operation during training.
  • Meta-task Generalization: Breaks down complex editing tasks into five fundamental meta-tasks: Addition, Deletion, Replacement, Camera Motion, and Position Change. Training on these meta-tasks allows the model to generalize to unseen, diverse editing scenarios.
  • CoT-Editing Consistency (CEC) Reward: A VLM-based reward mechanism integrated into a Flow-GRPO framework that ensures the model's output aligns accurately with its reasoning process.

Usage

To use Meta-CoT for single-image editing, first set up the environment as described in the GitHub repository. You can then run the following inference script:

python inference/edit_single.py --image <your-image-path> --instruction <editing-instruction>

Key Inference Parameters

Parameter Description Typical Range
cfg_text_scale Text prompt guidance strength 4.0 - 8.0
cfg_image_scale Input image preservation strength 1.0 - 2.0
num_timesteps Total denoising steps 50

Citation

If you find Meta-CoT useful in your research, please cite:

@article{zhang2026metacot,
  title={Meta-CoT: Enhancing Granularity and Generalization in Image Editing},
  author={Zhang, Shiyi and Cheng, Yiji and Hang, Tiankai and Yin, Zijin and He, Runze and Xu, Yu and Dai, Wenxun and Lin, Yunlong and Wang, Chunyu and Lu, Qinglin and Tang, Yansong},
  journal={arXiv preprint arXiv:2604.24625},
  year={2026}
}

Acknowledgments

The code for Meta-CoT is built upon Bagel. We thank the authors for their open-source contributions.