# Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers **Yiqing Shi**, **Yiren Song**, **Mike Zheng Shou** [![arXiv](https://img.shields.io/badge/arXiv-2511.18673-b31b1b.svg)](https://arxiv.org/abs/2511.18673) [![HuggingFace](https://img.shields.io/badge/Hugging%20Face-Model-yellow.svg?logo=huggingface)](https://hf-mirror.com/Seq2Tri/Edit2Perceive/tree/main) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/showlab/Edit2Perceive/blob/main/LICENSE) ![Teaser](samples/teaser.png) > **Abstract:** We present **Edit2Perceive**, a unified framework for diverse dense prediction tasks. We demonstrate that image editing diffusion models (specifically FLUX.1 Kontext), rather than text-to-image generators, provide a better inductive bias for deterministic dense perception. Our model achieves state-of-the-art performance across **Zero-shot Monocular Depth Estimation**, **Surface Normal Estimation**, and **Interactive Matting**, supporting efficient single-step deterministic inference. --- ## 📰 News **Dec 19, 2025** Inference Code Release, with model weights **Dec 23, 2025** Training Code Release **Feb, 2026** This Paper was accepted by **CVPR2026** ## 🛠️ Installation ### 1. Environment Setup ```bash git clone [https://github.com/showlab/Edit2Perceive.git](https://github.com/showlab/Edit2Perceive.git) cd Edit2Perceive # Create environment (Python 3.12 recommended) conda create -n e2p python=3.12 conda activate e2p # Install dependencies pip install -r requirements.txt ``` ### 2. Download Models **Step 1: Download Base Model (FLUX.1-Kontext)** ```bash # If huggingface is not available, use mirror export HF_ENDPOINT=https://hf-mirror.com hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev ``` **Step 2: Download Edit2Perceive Weights** Place the models in the `ckpts/` directory. * **Option A: LoRA Weights** (Small size, fast validation) ```bash hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors" ``` * **Option B: Full Model Weights** (Best quality) ```bash hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors" ``` **Required Directory Structure:** ```text Edit2Perceive/ ├── ckpts/ │ ├── depth.safetensors │ ├── depth_lora.safetensors │ ├── normal.safetensors │ ├── normal_lora.safetensors │ ├── matting.safetensors │ └── matting_lora.safetensors ├── FLUX.1-Kontext-dev/ └── ... ``` --- ## 🚀 Inference ### Web UI (Gradio) ```bash python app.py # Visit http://localhost:7860 ``` ### Command Line Run inference on images without the UI: ```bash python inference.py ``` ## 📊 Evaluation ### 1. Prepare Datasets Please download the evaluation datasets from the links below: * **Depth:** [Evaluation Dataset](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset) * **Normal:** [Evaluation Dataset](https://share.phys.ethz.ch/~pf/bingkedata/marigold/marigold_normals/evaluation_dataset) * **Matting:** [P3M-10k](https://github.com/JizhiziLi/P3M), [AM-2k](https://github.com/JizhiziLi/GFM), [AIM-500](https://github.com/JizhiziLi/AIM) ### 2. Run Evaluation Before run evaluation, chagne the `gt_path` in `utils/eval_multiple_datasets.py` to your dataset path. And then ```bash # Depth python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors # Normal python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors # Matting python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors ``` --- ## 🏋️ Training ### 1. Dataset Preparation Set up the datasets for your target task.
Datasets for Depth Estimation (Hypersim & Virtual KITTI 2) 1. **Hypersim Dataset** * Download: ```bash python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent ``` * Preprocess: ```bash python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output ``` 2. **Virtual KITTI 2 Dataset** * Download from [here](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/).
Datasets for Surface Normal (Hypersim, InteriorVerse & Sintel) 1. **Hypersim Dataset** * Download: ```bash python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent ``` * Preprocess: ```bash python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output ``` 2. **InteriorVerse Dataset** * Refer to [download instructions](https://interiorverse.github.io/#download). * Preprocess: ```bash python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output ``` 3. **Sintel Dataset** * Download via [Link 1](http://files.is.tue.mpg.de/sintel/MPI-Sintel-complete.zip) or [Link 2](http://sintel.cs.washington.edu/MPI-Sintel-complete.zip).
Datasets for Interactive Matting (Comp-1k, Distinctions, AM-2k, COCO) * **Sources:** [Composition-1k](https://github.com/JizhiziLi/GFM), [Distinctions-646](https://github.com/yuhaoliu7456/CVPR2020-HAttMatting), [AM-2k](https://github.com/JizhiziLi/GFM), [COCO-Matting](https://github.com/XiaRho/SEMat?tab=readme-ov-file). * **Special Instructions:** * **Distinctions-646:** You must generate the merged dataset yourself using backgrounds sampled from [VOC2012](https://datasets.cms.waikato.ac.nz/ufdl/data/pascalvoc2012/VOCtrainval_11-May-2012.tar). Refer to `preprocess/matting/preprocess_distinctions_646.py`. * **COCO-Matting:** Download [trimap_alpha](https://www.google.com/search?q=https://drive.google.com/file/d/1Q-clw6T6OnNNDEJ0gtkqOEIagVAvqFWU/view%3Fusp%3Dsharing) and [train2017.zip](http://images.cocodataset.org/zips/train2017.zip). Mannually split the `trimap_alpha` (concatenated in width) into single alpha channels.
### 2. Configure Paths Update the `--dataset_base_path` in the scripts located in `scripts/*.sh`. **Note: The dataset order is strict and must not be changed.** ```bash # Example for Depth (Hypersim + VKITTI2) --dataset_base_path "/path/to/Hypersim/processed_depth,/path/to/vkitti2" # Example for Normal (Hypersim + InteriorVerse + Sintel) --dataset_base_path "/path/to/Hypersim/processed_normal,/path/to/InteriorVerse/processed_normal,/path/to/sintel" # Example for Matting --dataset_base_path "/path/to/composition-1k,/path/to/Distinctions-646,/path/to/AM-2k,/path/to/COCO-Matte" ``` ### 3. Run Training Execute the corresponding script for LoRA or Full Fine-tuning, more details of training refer to [training_args_instructions.md](./scripts/training_args_instructions.md) ```bash # Depth Estimation bash scripts/Kontext_depth_lora.sh bash scripts/Kontext_depth.sh # Surface Normal Estimation bash scripts/Kontext_normal_lora.sh bash scripts/Kontext_normal.sh # Interactive Matting bash scripts/Kontext_matting_lora.sh bash scripts/Kontext_matting. ``` --- ## 📝 Cite If you find our work useful in your research, please consider citing our paper: ```bibtex @misc{shi2025edit2perceive, title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers}, author={Yiqing Shi and Yiren Song and Mike Zheng Shou}, year={2025}, eprint={2511.18673}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={[https://arxiv.org/abs/2511.18673](https://arxiv.org/abs/2511.18673)}, } ``` ## 📧 Contact If you have any questions, please feel free to contact **Yiqing Shi** at [yqshi@stu.pku.edu.cn](mailto:yqshi@stu.pku.edu.cn).