Diffusers
Safetensors
EvalMDE / Edit2Perceive /README.md
zeyuren2002's picture
Add files using upload-large-folder tool
7f921f4 verified
<!--
# Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
*Yiqing Shi, Yiren Song, Mike Zheng Shou*
[![arXiv](https://img.shields.io/badge/arXiv-2511.18673-b31b1b.svg)](https://arxiv.org/abs/2511.18673)
[![HuggingFace](https://img.shields.io/badge/Hugging%20Face-model-yellow.svg?logo=huggingface)](https://hf-mirror.com/Seq2Tri/Edit2Perceive/tree/main)
![Teaser](samples/teaser.png )
## Installation
1. **Clone the repository**
```bash
git clone https://github.com/showlab/Edit2Perceive.git
cd Edit2Perceive
conda create -n e2p python=3.12 # recommend version
conda activate e2p
pip install -r requirements.txt
```
2. **Download Base Model**
Download the [FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) model:
```bash
export HF_ENDPOINT=https://hf-mirror.com # if huggingface is not available, use this mirror
hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev
```
3. **Download Our Models**
Download our pre-trained models and place them in the `ckpts/` directory. You can either download lora version (small size for fast validation) or full version (best quality but file is large):
**Option1** Download LoRA weights
```bash
hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors"
hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors"
```
**Option2** Download full model weights
```bash
```
The Final Folder Sturcture should be like this:
```bash
ckpts/
├── depth.safetensors
├── depth_lora.safetensors
├── normal.safetensors
├── normal_lora.safetensors
├── matting.safetensors
└── matting_lora.safetensors
```
## Inference
### UI
```bash
python app.py
```
and then visit `http://localhost:7860`
### No UI
```bash
python inference.py
```
## Eval on benchmarks
Please download the evluation datasets: [depth](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset),[normal](https://share.phys.ethz.ch/~pf/bingkedata/marigold/marigold_normals/evaluation_dataset), matting:[P3M-10k](https://github.com/JizhiziLi/P3M),[AM-2k](https://github.com/JizhiziLi/GFM) and [AIM-500](https://github.com/JizhiziLi/AIM)
And then run:
```bash
python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors
python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors
python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors
```
## Train
### Prepare training dataset
#### Depth
Prepare for Hypersim and Virtual KITTI 2 datasets.
1. Hypersim Dataset
```bash
python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent
```
After download, preprocess with
```
python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
```
2. Virtual KITTI 2 Dataset: Download by this [link](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/)
#### Normals
Prepare for Hypersim, Interiorverse and Sintel datasets.
1. Hypersim Dataset
Download by this command:
```bash
python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent
```
Preprocess with:
```bash
python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
```
2. InteriorVerse Dataset: Refer to [download instructions](https://interiorverse.github.io/#download) and preprocess with `python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output`
3. Sintel Dataset: Download by this [link](http://files.is.tue.mpg.de/sintel/MPI-Sintel-complete.zip) or [alternative link](http://sintel.cs.washington.edu/MPI-Sintel-complete.zip)
#### Matting
[Composition-1k](https://github.com/JizhiziLi/GFM), [Distinctions-646](https://github.com/yuhaoliu7456/CVPR2020-HAttMatting), [AM-2k](https://github.com/JizhiziLi/GFM), [COCO-Matting](https://github.com/XiaRho/SEMat?tab=readme-ov-file)
For Distinctions-646 Dataset, the official repo offers the fg and alpha without bg and merged, bg is sampled from [VOC2012](https://datasets.cms.waikato.ac.nz/ufdl/data/pascalvoc2012/VOCtrainval_11-May-2012.tar) and you need to gen merged dataset yourself (refer to `preprocess/matting/preprocess_distinctions_646.py`).
For COCO-Matting Dataset, you need to download [COCO-Matting_trimap_alpha.7z
](https://drive.google.com/file/d/1Q-clw6T6OnNNDEJ0gtkqOEIagVAvqFWU/view?usp=sharing)and [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) , and then split the trimap_alpha (concat in width) file into single alpha.
After the dataset preparation, change the `--dataset_base_path` to your dataset absolute path in scripts/*.sh, for example(the dataset order is important, please don't change):
```bash
# depth task: Hypersim + VKITTI2
--dataset_base_path "/mnt/nfs/workspace/syq/dataset/Hypersim/processed_depth,/mnt/nfs/workspace/syq/dataset/vkitti2"
# normal task: Hypersim + InteriorVerse + Sintel
--dataset_base_path "/mnt/nfs/workspace/syq/dataset/Hypersim/processed_normal,/mnt/nfs/workspace/syq/dataset/InteriorVerse/processed_normal,/mnt/nfs/workspace/syq/dataset/sintel" \
# matting task:
--dataset_base_path /mnt/nfs/workspace/syq/dataset/matting/composition-1k,/mnt/nfs/workspace/syq/dataset/matting/Distinctions-646,/mnt/nfs/workspace/syq/dataset/matting/AM-2k,/mnt/nfs/workspace/syq/dataset/matting/COCO-Matte \
```
And then start to train:
```bash
bash scripts/Kontext_depth_lora.sh
bash scripts/Kontext_depth.sh
bash scripts/Kontext_normal_lora.sh
bash scripts/Kontext_normal.sh
bash scripts/Kontext_matting_lora.sh
bash scripts/Kontext_matting.
```
## Cite
If you find our work useful in your research please consider citing our paper:
```Bibtex
@misc{edit2perceive,
title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers},
author={Yiqing Shi and Yiren Song and Mike Zheng Shou},
year={2025},
eprint={2511.18673},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.18673},
}
```
## Contact
If you have any questions, please feel free to contact yqshi@stu.pku.edu.cn -->
# Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
**Yiqing Shi**, **Yiren Song**, **Mike Zheng Shou**
[![arXiv](https://img.shields.io/badge/arXiv-2511.18673-b31b1b.svg)](https://arxiv.org/abs/2511.18673)
[![HuggingFace](https://img.shields.io/badge/Hugging%20Face-Model-yellow.svg?logo=huggingface)](https://hf-mirror.com/Seq2Tri/Edit2Perceive/tree/main)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/showlab/Edit2Perceive/blob/main/LICENSE)
![Teaser](samples/teaser.png)
> **Abstract:** We present **Edit2Perceive**, a unified framework for diverse dense prediction tasks. We demonstrate that image editing diffusion models (specifically FLUX.1 Kontext), rather than text-to-image generators, provide a better inductive bias for deterministic dense perception. Our model achieves state-of-the-art performance across **Zero-shot Monocular Depth Estimation**, **Surface Normal Estimation**, and **Interactive Matting**, supporting efficient single-step deterministic inference.
---
## 📰 News
**Dec 19, 2025** Inference Code Release, with model weights
**Dec 23, 2025** Training Code Release
**Feb, 2026** This Paper was accepted by **CVPR2026**
## 🛠️ Installation
### 1. Environment Setup
```bash
git clone [https://github.com/showlab/Edit2Perceive.git](https://github.com/showlab/Edit2Perceive.git)
cd Edit2Perceive
# Create environment (Python 3.12 recommended)
conda create -n e2p python=3.12
conda activate e2p
# Install dependencies
pip install -r requirements.txt
```
### 2. Download Models
**Step 1: Download Base Model (FLUX.1-Kontext)**
```bash
# If huggingface is not available, use mirror
export HF_ENDPOINT=https://hf-mirror.com
hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev
```
**Step 2: Download Edit2Perceive Weights**
Place the models in the `ckpts/` directory.
* **Option A: LoRA Weights** (Small size, fast validation)
```bash
hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors"
```
* **Option B: Full Model Weights** (Best quality)
```bash
hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors"
```
**Required Directory Structure:**
```text
Edit2Perceive/
├── ckpts/
│ ├── depth.safetensors
│ ├── depth_lora.safetensors
│ ├── normal.safetensors
│ ├── normal_lora.safetensors
│ ├── matting.safetensors
│ └── matting_lora.safetensors
├── FLUX.1-Kontext-dev/
└── ...
```
---
## 🚀 Inference
### Web UI (Gradio)
```bash
python app.py # Visit http://localhost:7860
```
### Command Line
Run inference on images without the UI:
```bash
python inference.py
```
## 📊 Evaluation
### 1. Prepare Datasets
Please download the evaluation datasets from the links below:
* **Depth:** [Evaluation Dataset](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset)
* **Normal:** [Evaluation Dataset](https://share.phys.ethz.ch/~pf/bingkedata/marigold/marigold_normals/evaluation_dataset)
* **Matting:** [P3M-10k](https://github.com/JizhiziLi/P3M), [AM-2k](https://github.com/JizhiziLi/GFM), [AIM-500](https://github.com/JizhiziLi/AIM)
### 2. Run Evaluation
Before run evaluation, chagne the `gt_path` in `utils/eval_multiple_datasets.py` to your dataset path. And then
```bash
# Depth
python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors
# Normal
python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors
# Matting
python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors
```
---
## 🏋️ Training
### 1. Dataset Preparation
Set up the datasets for your target task.
<details>
<summary><strong>Datasets for Depth Estimation</strong> (Hypersim & Virtual KITTI 2)</summary>
1. **Hypersim Dataset**
* Download:
```bash
python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent
```
* Preprocess:
```bash
python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
```
2. **Virtual KITTI 2 Dataset**
* Download from [here](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/).
</details>
<details>
<summary><strong>Datasets for Surface Normal</strong> (Hypersim, InteriorVerse & Sintel)</summary>
1. **Hypersim Dataset**
* Download:
```bash
python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent
```
* Preprocess:
```bash
python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
```
2. **InteriorVerse Dataset**
* Refer to [download instructions](https://interiorverse.github.io/#download).
* Preprocess:
```bash
python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output
```
3. **Sintel Dataset**
* Download via [Link 1](http://files.is.tue.mpg.de/sintel/MPI-Sintel-complete.zip) or [Link 2](http://sintel.cs.washington.edu/MPI-Sintel-complete.zip).
</details>
<details>
<summary><strong>Datasets for Interactive Matting</strong> (Comp-1k, Distinctions, AM-2k, COCO)</summary>
* **Sources:** [Composition-1k](https://github.com/JizhiziLi/GFM), [Distinctions-646](https://github.com/yuhaoliu7456/CVPR2020-HAttMatting), [AM-2k](https://github.com/JizhiziLi/GFM), [COCO-Matting](https://github.com/XiaRho/SEMat?tab=readme-ov-file).
* **Special Instructions:**
* **Distinctions-646:** You must generate the merged dataset yourself using backgrounds sampled from [VOC2012](https://datasets.cms.waikato.ac.nz/ufdl/data/pascalvoc2012/VOCtrainval_11-May-2012.tar). Refer to `preprocess/matting/preprocess_distinctions_646.py`.
* **COCO-Matting:** Download [trimap_alpha](https://www.google.com/search?q=https://drive.google.com/file/d/1Q-clw6T6OnNNDEJ0gtkqOEIagVAvqFWU/view%3Fusp%3Dsharing) and [train2017.zip](http://images.cocodataset.org/zips/train2017.zip). Mannually split the `trimap_alpha` (concatenated in width) into single alpha channels.
</details>
### 2. Configure Paths
Update the `--dataset_base_path` in the scripts located in `scripts/*.sh`. **Note: The dataset order is strict and must not be changed.**
```bash
# Example for Depth (Hypersim + VKITTI2)
--dataset_base_path "/path/to/Hypersim/processed_depth,/path/to/vkitti2"
# Example for Normal (Hypersim + InteriorVerse + Sintel)
--dataset_base_path "/path/to/Hypersim/processed_normal,/path/to/InteriorVerse/processed_normal,/path/to/sintel"
# Example for Matting
--dataset_base_path "/path/to/composition-1k,/path/to/Distinctions-646,/path/to/AM-2k,/path/to/COCO-Matte"
```
### 3. Run Training
Execute the corresponding script for LoRA or Full Fine-tuning, more details of training refer to [training_args_instructions.md](./scripts/training_args_instructions.md)
```bash
# Depth Estimation
bash scripts/Kontext_depth_lora.sh
bash scripts/Kontext_depth.sh
# Surface Normal Estimation
bash scripts/Kontext_normal_lora.sh
bash scripts/Kontext_normal.sh
# Interactive Matting
bash scripts/Kontext_matting_lora.sh
bash scripts/Kontext_matting.
```
---
## 📝 Cite
If you find our work useful in your research, please consider citing our paper:
```bibtex
@misc{shi2025edit2perceive,
title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers},
author={Yiqing Shi and Yiren Song and Mike Zheng Shou},
year={2025},
eprint={2511.18673},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={[https://arxiv.org/abs/2511.18673](https://arxiv.org/abs/2511.18673)},
}
```
## 📧 Contact
If you have any questions, please feel free to contact **Yiqing Shi** at [yqshi@stu.pku.edu.cn](mailto:yqshi@stu.pku.edu.cn).