FQiao commited on
Commit
13c8e44
Β·
verified Β·
1 Parent(s): 63ad196

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -4
README.md CHANGED
@@ -4,11 +4,119 @@ pipeline_tag: image-to-image
4
  library_name: diffusers
5
  ---
6
 
7
- # [ICCV 2025] GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching
8
-
9
 
10
  [![Project Site](https://img.shields.io/badge/Project-Web-green)](https://qjizhi.github.io/genstereo) [![Spaces](https://img.shields.io/badge/Spaces-Demo-yellow?logo=huggingface)](https://huggingface.co/spaces/FQiao/GenStereo) [![Github](https://img.shields.io/badge/Github-Repo-orange?logo=github)](https://github.com/Qjizhi/GenStereo) [![Models](https://img.shields.io/badge/Models-checkpoints-blue?logo=huggingface)](https://huggingface.co/FQiao/GenStereo-sd2.1/tree/main) [![arXiv](https://img.shields.io/badge/arXiv-2503.12720-red?logo=arxiv)](https://arxiv.org/abs/2503.12720)
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- This repository contains the model presented in [GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching](https://huggingface.co/papers/2503.12720). The models are finetuned on Stable Diffusion 1.5, for SD v2.1, you can find [here](https://huggingface.co/FQiao/GenStereo-sd2.1).
14
- ![](teaser_coco.jpg).
 
4
  library_name: diffusers
5
  ---
6
 
7
+ # [ICCV 2025] Towards Open-World Generation of Stereo Images and Unsupervised Matching
 
8
 
9
  [![Project Site](https://img.shields.io/badge/Project-Web-green)](https://qjizhi.github.io/genstereo) [![Spaces](https://img.shields.io/badge/Spaces-Demo-yellow?logo=huggingface)](https://huggingface.co/spaces/FQiao/GenStereo) [![Github](https://img.shields.io/badge/Github-Repo-orange?logo=github)](https://github.com/Qjizhi/GenStereo) [![Models](https://img.shields.io/badge/Models-checkpoints-blue?logo=huggingface)](https://huggingface.co/FQiao/GenStereo-sd2.1/tree/main) [![arXiv](https://img.shields.io/badge/arXiv-2503.12720-red?logo=arxiv)](https://arxiv.org/abs/2503.12720)
10
 
11
+ This repository contains the model presented in [Towards Open-World Generation of Stereo Images and Unsupervised Matching](https://huggingface.co/papers/2503.12720). The models are finetuned on Stable Diffusion 1.5, for SD v2.1, you can find [here](https://huggingface.co/FQiao/GenStereo-sd2.1).
12
+ ![](teaser_coco.jpg)
13
+
14
+ ## Abstract
15
+ Stereo images are fundamental to numerous applications, including extended reality (XR) devices, autonomous driving, and robotics. Unfortunately, acquiring high-quality stereo images remains challenging due to the precise calibration requirements of dual-camera setups and the complexity of obtaining accurate, dense disparity maps. Existing stereo image generation methods typically focus on either visual quality for viewing or geometric accuracy for matching, but not both. We introduce GenStereo, a diffusion-based approach, to bridge this gap. The method includes two primary innovations (1) conditioning the diffusion process on a disparity-aware coordinate embedding and a warped input image, allowing for more precise stereo alignment than previous methods, and (2) an adaptive fusion mechanism that intelligently combines the diffusion-generated image with a warped image, improving both realism and disparity consistency. Through extensive training on 11 diverse stereo datasets, GenStereo demonstrates strong generalization ability. GenStereo achieves state-of-the-art performance in both stereo image generation and unsupervised stereo matching tasks.
16
+
17
+ ## How to use
18
+
19
+ ### Environment
20
+
21
+ We tested our codes on Ubuntu with nVidia A100 GPU. If you're using other machines like Windows, consider using Docker. You can either add packages to your python environment or use Docker to build an python environment. Commands below are all expected to run in the root directory of the repository.
22
+
23
+ We tested the environment with python `>=3.10` and CUDA `=11.8`. To add mandatory dependencies run the command below.
24
+
25
+ ``` shell
26
+ pip install -r requirements.txt
27
+ ```
28
+
29
+ To run developmental codes such as the example provided in jupyter notebook and the live demo implemented by gradio, add extra dependencies via the command below.
30
+
31
+ ``` shell
32
+ pip install -r requirements_dev.txt
33
+ ```
34
+
35
+ ### Download pretrained models
36
+
37
+ GenStereo uses pretrained models which consist of both our finetuned models and publicly available third-party ones. Download all the models to `checkpoints` directory or anywhere of your choice. You can do it manually or by the [download_models.sh](scripts/download_models.sh) script.
38
+
39
+ #### Download script
40
+
41
+ ``` shell
42
+ bash scripts/download_models.sh
43
+ ```
44
+
45
+ #### Manual download
46
+
47
+ > [!NOTE]
48
+ > Models and checkpoints provided below may be distributed under different licenses. Users are required to check licenses carefully on their behalf.
49
+
50
+ 1. Our finetuned models, we provide two versions of GenStereo
51
+ - v1.5: 512px, faster, [model card](https://huggingface.co/FQiao/GenStereo).
52
+ - v2.1: 768px, better performance, high resolution, takes more time, [model card](https://huggingface.co/FQiao/GenStereo-sd2.1).
53
+ 2. Pretrained models:
54
+ - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)
55
+ - download `config.json` and `diffusion_pytorch_model.safetensors` to `checkpoints/sd-vae-ft-mse`
56
+ - [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
57
+ - download `image_encoder/config.json` and `image_encoder/pytorch_model.bin` to `checkpoints/image_encoder`
58
+ 3. MDE (Monocular Depth Estimation) models
59
+ - We use [Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2) as the MDE model and get the disparity maps.
60
+ The final `checkpoints` directory must look like this:
61
+
62
+ ```
63
+ .
64
+ β”œβ”€β”€ depth_anything_v2_vitl.pth
65
+ β”œβ”€β”€ genstereo-v1.5
66
+ β”‚ β”œβ”€β”€ config.json
67
+ β”‚ β”œβ”€β”€ denoising_unet.pth
68
+ β”‚ β”œβ”€β”€ fusion_layer.pth
69
+ β”‚ β”œβ”€β”€ pose_guider.pth
70
+ β”‚ └── reference_unet.pth
71
+ β”œβ”€β”€ genstereo-v2.1
72
+ β”‚ β”œβ”€β”€ config.json
73
+ β”‚ β”œβ”€β”€ denoising_unet.pth
74
+ β”‚ β”œβ”€β”€ fusion_layer.pth
75
+ β”‚ β”œβ”€β”€ pose_guider.pth
76
+ β”‚ └── reference_unet.pth
77
+ β”œβ”€β”€ image_encoder
78
+ β”‚ β”œβ”€β”€ config.json
79
+ β”‚ └── pytorch_model.bin
80
+ └── sd-vae-ft-mse
81
+ β”œβ”€β”€ config.json
82
+ └── diffusion_pytorch_model.safetensors
83
+ ```
84
+
85
+ ### Inference
86
+ You can easily run the inference code by running the following command, and the results will be save under `./vis` folder.
87
+
88
+ ```bash
89
+ python test.py /path/to/your/image
90
+ ```
91
+
92
+ ### Gradio live demo
93
+
94
+ An interactive live demo is also available. Start gradio demo by running the command below, and goto [http://127.0.0.1:7860/](http://127.0.0.1:7860/)
95
+ If you are running it on the server, be sure to forward the port 7860.
96
+
97
+ Or you can just visit [Spaces](https://huggingface.co/spaces/FQiao/GenStereo) hosted by Hugging Face to try it now.
98
+
99
+ ```shell
100
+ python app.py
101
+ ```
102
+
103
+ ## Train
104
+ Please read [Train_Guide.md](./Trian_Guide.md).
105
+
106
+ ## Citation
107
+
108
+ ``` bibtex
109
+ @inproceedings{qiao2025genstereo,
110
+ author = {Qiao, Feng and Xiong, Zhexiao and Xing, Eric and Jacobs, Nathan},
111
+ title = {Towards Open-World Generation of Stereo Images and Unsupervised Matching},
112
+ booktitle = {Proceedings of the {IEEE/CVF} International Conference on Computer Vision ({ICCV})},
113
+ year = {2025},
114
+ eprint = {2503.12720},
115
+ archiveprefix = {arXiv},
116
+ primaryclass = {cs.CV}
117
+ }
118
+ ```
119
+
120
+ ## Acknowledgements
121
 
122
+ Our codes are based on [GenWarp](https://github.com/sony/genwarp), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) and other repositories. We thank the authors of relevant repositories and papers.