zeyuren2002

Add files using upload-large-folder tool

7f921f4 verified 8 days ago

14.6 kB

	<!--
	# Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
	Yiqing Shi, Yiren Song, Mike Zheng Shou


	[![arXiv](https://img.shields.io/badge/arXiv-2511.18673-b31b1b.svg)](https://arxiv.org/abs/2511.18673)
	[![HuggingFace](https://img.shields.io/badge/Hugging%20Face-model-yellow.svg?logo=huggingface)](https://hf-mirror.com/Seq2Tri/Edit2Perceive/tree/main)

	![Teaser](samples/teaser.png )

	## Installation

	1. Clone the repository

	```bash
	git clone https://github.com/showlab/Edit2Perceive.git
	cd Edit2Perceive
	conda create -n e2p python=3.12 # recommend version
	conda activate e2p
	pip install -r requirements.txt
	```

	2. Download Base Model

	Download the [FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) model:
	```bash
	export HF_ENDPOINT=https://hf-mirror.com # if huggingface is not available, use this mirror
	hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev
	```
	3. Download Our Models

	Download our pre-trained models and place them in the `ckpts/` directory. You can either download lora version (small size for fast validation) or full version (best quality but file is large):

	Option1 Download LoRA weights
	```bash
	hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors"
	hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors"
	```
	Option2 Download full model weights
	```bash
	```
	The Final Folder Sturcture should be like this:
	```bash
	ckpts/
	├── depth.safetensors
	├── depth_lora.safetensors
	├── normal.safetensors
	├── normal_lora.safetensors
	├── matting.safetensors
	└── matting_lora.safetensors
	```

	## Inference
	### UI

	```bash
	python app.py
	```
	and then visit `http://localhost:7860`

	### No UI

	```bash
	python inference.py
	```
	## Eval on benchmarks
	Please download the evluation datasets: [depth](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset),[normal](https://share.phys.ethz.ch/~pf/bingkedata/marigold/marigold_normals/evaluation_dataset), matting:[P3M-10k](https://github.com/JizhiziLi/P3M),[AM-2k](https://github.com/JizhiziLi/GFM) and [AIM-500](https://github.com/JizhiziLi/AIM)

	And then run:
	```bash
	python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors
	python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors
	python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors

	```
	## Train
	### Prepare training dataset
	#### Depth

	Prepare for Hypersim and Virtual KITTI 2 datasets.

	1. Hypersim Dataset
	```bash
	python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent
	```
	After download, preprocess with
	```
	python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
	```

	2. Virtual KITTI 2 Dataset: Download by this [link](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/)

	#### Normals
	Prepare for Hypersim, Interiorverse and Sintel datasets.
	1. Hypersim Dataset

	Download by this command:
	```bash
	python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent
	```
	Preprocess with:
	```bash
	python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
	```
	2. InteriorVerse Dataset: Refer to [download instructions](https://interiorverse.github.io/#download) and preprocess with `python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output`
	3. Sintel Dataset: Download by this [link](http://files.is.tue.mpg.de/sintel/MPI-Sintel-complete.zip) or [alternative link](http://sintel.cs.washington.edu/MPI-Sintel-complete.zip)

	#### Matting
	[Composition-1k](https://github.com/JizhiziLi/GFM), [Distinctions-646](https://github.com/yuhaoliu7456/CVPR2020-HAttMatting), [AM-2k](https://github.com/JizhiziLi/GFM), [COCO-Matting](https://github.com/XiaRho/SEMat?tab=readme-ov-file)

	For Distinctions-646 Dataset, the official repo offers the fg and alpha without bg and merged, bg is sampled from [VOC2012](https://datasets.cms.waikato.ac.nz/ufdl/data/pascalvoc2012/VOCtrainval_11-May-2012.tar) and you need to gen merged dataset yourself (refer to `preprocess/matting/preprocess_distinctions_646.py`).

	For COCO-Matting Dataset, you need to download [COCO-Matting_trimap_alpha.7z
	](https://drive.google.com/file/d/1Q-clw6T6OnNNDEJ0gtkqOEIagVAvqFWU/view?usp=sharing)and [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) , and then split the trimap_alpha (concat in width) file into single alpha.

	After the dataset preparation, change the `--dataset_base_path` to your dataset absolute path in scripts/*.sh, for example(the dataset order is important, please don't change):
	```bash
	# depth task: Hypersim + VKITTI2
	--dataset_base_path "/mnt/nfs/workspace/syq/dataset/Hypersim/processed_depth,/mnt/nfs/workspace/syq/dataset/vkitti2"
	# normal task: Hypersim + InteriorVerse + Sintel
	--dataset_base_path "/mnt/nfs/workspace/syq/dataset/Hypersim/processed_normal,/mnt/nfs/workspace/syq/dataset/InteriorVerse/processed_normal,/mnt/nfs/workspace/syq/dataset/sintel" \
	# matting task:
	--dataset_base_path /mnt/nfs/workspace/syq/dataset/matting/composition-1k,/mnt/nfs/workspace/syq/dataset/matting/Distinctions-646,/mnt/nfs/workspace/syq/dataset/matting/AM-2k,/mnt/nfs/workspace/syq/dataset/matting/COCO-Matte \
	```
	And then start to train:
	```bash
	bash scripts/Kontext_depth_lora.sh
	bash scripts/Kontext_depth.sh
	bash scripts/Kontext_normal_lora.sh
	bash scripts/Kontext_normal.sh
	bash scripts/Kontext_matting_lora.sh
	bash scripts/Kontext_matting.
	```

	## Cite
	If you find our work useful in your research please consider citing our paper:

	```Bibtex
	@misc{edit2perceive,
	title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers},
	author={Yiqing Shi and Yiren Song and Mike Zheng Shou},
	year={2025},
	eprint={2511.18673},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2511.18673},
	}
	```

	## Contact
	If you have any questions, please feel free to contact yqshi@stu.pku.edu.cn -->

	# Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
	Yiqing Shi, Yiren Song, Mike Zheng Shou


	[![arXiv](https://img.shields.io/badge/arXiv-2511.18673-b31b1b.svg)](https://arxiv.org/abs/2511.18673)
	[![HuggingFace](https://img.shields.io/badge/Hugging%20Face-Model-yellow.svg?logo=huggingface)](https://hf-mirror.com/Seq2Tri/Edit2Perceive/tree/main)
	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/showlab/Edit2Perceive/blob/main/LICENSE)

	![Teaser](samples/teaser.png)

	> Abstract: We present Edit2Perceive, a unified framework for diverse dense prediction tasks. We demonstrate that image editing diffusion models (specifically FLUX.1 Kontext), rather than text-to-image generators, provide a better inductive bias for deterministic dense perception. Our model achieves state-of-the-art performance across Zero-shot Monocular Depth Estimation, Surface Normal Estimation, and Interactive Matting, supporting efficient single-step deterministic inference.

	---
	## 📰 News

	Dec 19, 2025 Inference Code Release, with model weights

	Dec 23, 2025 Training Code Release

	Feb, 2026 This Paper was accepted by CVPR2026

	## 🛠️ Installation

	### 1. Environment Setup

	```bash
	git clone [https://github.com/showlab/Edit2Perceive.git](https://github.com/showlab/Edit2Perceive.git)
	cd Edit2Perceive

	# Create environment (Python 3.12 recommended)
	conda create -n e2p python=3.12
	conda activate e2p

	# Install dependencies
	pip install -r requirements.txt
	```

	### 2. Download Models

	Step 1: Download Base Model (FLUX.1-Kontext)

	```bash
	# If huggingface is not available, use mirror
	export HF_ENDPOINT=https://hf-mirror.com

	hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev
	```

	Step 2: Download Edit2Perceive Weights
	Place the models in the `ckpts/` directory.

	* Option A: LoRA Weights (Small size, fast validation)
	```bash
	hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors"
	```


	* Option B: Full Model Weights (Best quality)
	```bash
	hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors"
	```



	Required Directory Structure:

	```text
	Edit2Perceive/
	├── ckpts/
	│ ├── depth.safetensors
	│ ├── depth_lora.safetensors
	│ ├── normal.safetensors
	│ ├── normal_lora.safetensors
	│ ├── matting.safetensors
	│ └── matting_lora.safetensors
	├── FLUX.1-Kontext-dev/
	└── ...
	```

	---

	## 🚀 Inference

	### Web UI (Gradio)

	```bash
	python app.py # Visit http://localhost:7860
	```

	### Command Line

	Run inference on images without the UI:

	```bash
	python inference.py
	```


	## 📊 Evaluation

	### 1. Prepare Datasets

	Please download the evaluation datasets from the links below:

	* Depth: [Evaluation Dataset](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset)
	* Normal: [Evaluation Dataset](https://share.phys.ethz.ch/~pf/bingkedata/marigold/marigold_normals/evaluation_dataset)
	* Matting: [P3M-10k](https://github.com/JizhiziLi/P3M), [AM-2k](https://github.com/JizhiziLi/GFM), [AIM-500](https://github.com/JizhiziLi/AIM)

	### 2. Run Evaluation
	Before run evaluation, chagne the `gt_path` in `utils/eval_multiple_datasets.py` to your dataset path. And then
	```bash
	# Depth
	python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors

	# Normal
	python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors

	# Matting
	python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors
	```

	---

	## 🏋️ Training

	### 1. Dataset Preparation

	Set up the datasets for your target task.

	<details>
	<summary><strong>Datasets for Depth Estimation</strong> (Hypersim & Virtual KITTI 2)</summary>

	1. Hypersim Dataset
	* Download:
	```bash
	python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent
	```

	* Preprocess:
	```bash
	python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
	```


	2. Virtual KITTI 2 Dataset
	* Download from [here](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/).



	</details>

	<details>
	<summary><strong>Datasets for Surface Normal</strong> (Hypersim, InteriorVerse & Sintel)</summary>

	1. Hypersim Dataset
	* Download:
	```bash
	python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent
	```


	* Preprocess:
	```bash
	python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
	```




	2. InteriorVerse Dataset
	* Refer to [download instructions](https://interiorverse.github.io/#download).
	* Preprocess:
	```bash
	python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output
	```




	3. Sintel Dataset
	* Download via [Link 1](http://files.is.tue.mpg.de/sintel/MPI-Sintel-complete.zip) or [Link 2](http://sintel.cs.washington.edu/MPI-Sintel-complete.zip).



	</details>

	<details>
	<summary><strong>Datasets for Interactive Matting</strong> (Comp-1k, Distinctions, AM-2k, COCO)</summary>

	* Sources: [Composition-1k](https://github.com/JizhiziLi/GFM), [Distinctions-646](https://github.com/yuhaoliu7456/CVPR2020-HAttMatting), [AM-2k](https://github.com/JizhiziLi/GFM), [COCO-Matting](https://github.com/XiaRho/SEMat?tab=readme-ov-file).
	* Special Instructions:
	* Distinctions-646: You must generate the merged dataset yourself using backgrounds sampled from [VOC2012](https://datasets.cms.waikato.ac.nz/ufdl/data/pascalvoc2012/VOCtrainval_11-May-2012.tar). Refer to `preprocess/matting/preprocess_distinctions_646.py`.
	* COCO-Matting: Download [trimap_alpha](https://www.google.com/search?q=https://drive.google.com/file/d/1Q-clw6T6OnNNDEJ0gtkqOEIagVAvqFWU/view%3Fusp%3Dsharing) and [train2017.zip](http://images.cocodataset.org/zips/train2017.zip). Mannually split the `trimap_alpha` (concatenated in width) into single alpha channels.



	</details>

	### 2. Configure Paths

	Update the `--dataset_base_path` in the scripts located in `scripts/.sh`. Note: The dataset order is strict and must not be changed.*

	```bash
	# Example for Depth (Hypersim + VKITTI2)
	--dataset_base_path "/path/to/Hypersim/processed_depth,/path/to/vkitti2"

	# Example for Normal (Hypersim + InteriorVerse + Sintel)
	--dataset_base_path "/path/to/Hypersim/processed_normal,/path/to/InteriorVerse/processed_normal,/path/to/sintel"

	# Example for Matting
	--dataset_base_path "/path/to/composition-1k,/path/to/Distinctions-646,/path/to/AM-2k,/path/to/COCO-Matte"
	```

	### 3. Run Training

	Execute the corresponding script for LoRA or Full Fine-tuning, more details of training refer to [training_args_instructions.md](./scripts/training_args_instructions.md)

	```bash
	# Depth Estimation
	bash scripts/Kontext_depth_lora.sh
	bash scripts/Kontext_depth.sh

	# Surface Normal Estimation
	bash scripts/Kontext_normal_lora.sh
	bash scripts/Kontext_normal.sh

	# Interactive Matting
	bash scripts/Kontext_matting_lora.sh
	bash scripts/Kontext_matting.
	```

	---

	## 📝 Cite

	If you find our work useful in your research, please consider citing our paper:

	```bibtex
	@misc{shi2025edit2perceive,
	title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers},
	author={Yiqing Shi and Yiren Song and Mike Zheng Shou},
	year={2025},
	eprint={2511.18673},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={[https://arxiv.org/abs/2511.18673](https://arxiv.org/abs/2511.18673)},
	}
	```

	## 📧 Contact

	If you have any questions, please feel free to contact Yiqing Shi at [yqshi@stu.pku.edu.cn](mailto:yqshi@stu.pku.edu.cn).