Instructions to use zeyuren2002/EvalMDE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use zeyuren2002/EvalMDE with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zeyuren2002/EvalMDE", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| # <img src="assets/badges/lotus_icon.png" alt="lotus" style="height:1em; vertical-align:bottom;"/> Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | |
| [](https://lotus3d.github.io/) | |
| [](https://arxiv.org/abs/2409.18124) | |
| [-yellow)](https://huggingface.co/spaces/haodongli/Lotus_Depth) | |
| [-yellow)](https://huggingface.co/spaces/haodongli/Lotus_Normal) | |
| [](https://github.com/kijai/ComfyUI-Lotus) | |
| [](https://replicate.com/chenxwh/lotus) | |
| [Jing He](https://scholar.google.com/citations?hl=en&user=RsLS11MAAAAJ)<sup>1<span style="color:red;">✱</span></sup>, | |
| [Haodong Li](https://haodong-li.com/)<sup>1<span style="color:red;">✱</span></sup>, | |
| [Wei Yin](https://yvanyin.net/)<sup>2</sup>, | |
| [Yixun Liang](https://yixunliang.github.io/)<sup>1</sup>, | |
| [Leheng Li](https://len-li.github.io/)<sup>1</sup>, | |
| [Kaiqiang Zhou]()<sup>3</sup>, | |
| [Hongbo Zhang]()<sup>3</sup>, | |
| [Bingbing Liu]()<sup>3</sup>,<br> | |
| [Ying-Cong Chen](https://www.yingcong.me/)<sup>1,4✉</sup> | |
| <span class="author-block"><sup>1</sup>HKUST(GZ)</span> | |
| <span class="author-block"><sup>2</sup>University of Adelaide</span> | |
| <span class="author-block"><sup>3</sup>Noah's Ark Lab</span> | |
| <span class="author-block"><sup>4</sup>HKUST</span><br> | |
| <span class="author-block"> | |
| <sup style="color:red;">✱</sup>**Both authors contributed equally.** | |
| <sup>✉</sup>Corresponding author. | |
| </span> | |
| π₯π₯π₯ **Please also check our latest Lotus-2! Useful links:** [**Project Page**](https://lotus-2.github.io/)**,** [**Github Repo**](https://github.com/EnVision-Research/Lotus-2)**.** π₯π₯π₯ | |
|  | |
|  | |
| We present **Lotus**, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used. | |
| ## π’ News | |
| - 2025-04-03: The training code of Lotus (Generative & Discriminative) is now available! | |
| - 2025-01-17: Please check out our latest models ([lotus-normal-g-v1-1](https://huggingface.co/jingheya/lotus-normal-g-v1-1), [lotus-normal-d-v1-1](https://huggingface.co/jingheya/lotus-normal-d-v1-1)), which were trained with aligned surface normals, leading to improved performance! | |
| - 2024-11-13: The demo now supports video depth estimation! | |
| - 2024-11-13: The Lotus disparity models ([Generative](https://huggingface.co/jingheya/lotus-depth-g-v2-0-disparity) & [Discriminative](https://huggingface.co/jingheya/lotus-depth-d-v2-0-disparity)) are now available, which achieve better performance! | |
| - 2024-10-06: The demos are now available ([Depth](https://huggingface.co/spaces/haodongli/Lotus_Depth) & [Normal](https://huggingface.co/spaces/haodongli/Lotus_Normal)). Please have a try! <br> | |
| - 2024-10-05: The inference code is now available! <br> | |
| - 2024-09-26: [Paper](https://arxiv.org/abs/2409.18124) released. Click [here](https://github.com/EnVision-Research/Lotus/issues/14#issuecomment-2409094495) if you are curious about the 3D point clouds of the teaser's depth maps! <br> | |
| ## π οΈ Setup | |
| This installation was tested on: Ubuntu 20.04 LTS, Python 3.10, CUDA 12.3, NVIDIA A800-SXM4-80GB. | |
| 1. Clone the repository (requires git): | |
| ``` | |
| git clone https://github.com/EnVision-Research/Lotus.git | |
| cd Lotus | |
| ``` | |
| 2. Install dependencies (requires conda): | |
| ``` | |
| conda create -n lotus python=3.10 -y | |
| conda activate lotus | |
| pip install -r requirements.txt | |
| ``` | |
| ## π€ Gradio Demo | |
| 1. Online demo: [Depth](https://huggingface.co/spaces/haodongli/Lotus_Depth) & [Normal](https://huggingface.co/spaces/haodongli/Lotus_Normal) | |
| 2. Local demo | |
| - For **depth** estimation, run: | |
| ``` | |
| python app.py depth | |
| ``` | |
| - For **normal** estimation, run: | |
| ``` | |
| python app.py normal | |
| ``` | |
| ## π₯ Training | |
| 1. Initialize your Accelerate environment with: | |
| ``` | |
| accelerate config --config_file=$PATH_TO_ACCELERATE_CONFIG_FILE | |
| ``` | |
| Please make sure you have installed the accelerate package. We have tested our training scripts with the accelerate version 0.29.3. | |
| 2. Prepare your training data: | |
| - [Hypersim](https://github.com/apple/ml-hypersim): | |
| - Download this [script](https://github.com/apple/ml-hypersim/blob/main/contrib/99991/download.py) into your `$PATH_TO_RAW_HYPERSIM_DATA` directory for data downloading. | |
| - Run the following command to download the data: | |
| ``` | |
| cd $PATH_TO_RAW_HYPERSIM_DATA | |
| # Download the tone-mapped images | |
| python ./download.py --contains scene_cam_ --contains final_preview --contains tonemap.jpg --silent | |
| # Download the depth maps | |
| python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains depth_meters --silent | |
| # Download the normal maps | |
| python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains normal --silent | |
| ``` | |
| - Download the split file from [here](https://github.com/apple/ml-hypersim/blob/main/evermotion_dataset/analysis/metadata_images_split_scene_v1.csv) and put it in the `$PATH_TO_RAW_HYPERSIM_DATA` directory. | |
| - Process the data with the command: `bash utils/process_hypersim.sh`. | |
| - [Virtual KITTI](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/): | |
| - Download the [rgb](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_rgb.tar), [depth](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_depth.tar), and [textgz](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_textgt.tar.gz) into the `$PATH_TO_VKITTI_DATA` directory and unzip them. | |
| - Make sure the directory structure is as follows: | |
| ``` | |
| SceneX/Y/frames/rgb/Camera_Z/rgb_%05d.jpg | |
| SceneX/Y/frames/depth/Camera_Z/depth_%05d.png | |
| SceneX/Y/colors.txt | |
| SceneX/Y/extrinsic.txt | |
| SceneX/Y/intrinsic.txt | |
| SceneX/Y/info.txt | |
| SceneX/Y/bbox.txt | |
| SceneX/Y/pose.txt | |
| ``` | |
| where $`X \in \{01, 02, 06, 18, 20\}`$ and represent one of 5 different locations. | |
| $`Y \in \{\texttt{15-deg-left}, \texttt{15-deg-right}, \texttt{30-deg-left}, \texttt{30-deg-right}, \texttt{clone}, \texttt{fog}, \texttt{morning}, \texttt{overcast}, \texttt{rain}, \texttt{sunset}\}`$ and represent the different variations. | |
| $`Z \in [0, 1]`$ and represent the left or right camera. | |
| Note that the indexes always start from 0. | |
| - Generate the normal maps with the command: `bash utils/depth2normal.sh`. | |
| 3. Run the training command! π | |
| - `bash train_scripts/train_lotus_g_{$TASK}.sh` for training Lotus Generative models; | |
| - `bash train_scripts/train_lotus_d_{$TASK}.sh` for training Lotus Discriminative models. | |
| ## πΉοΈ Inference | |
| ### Testing on your images | |
| 1. Place your images in a directory, for example, under `assets/in-the-wild_example` (where we have prepared several examples). | |
| 2. Run the inference command: `bash infer.sh`. | |
| ### Evaluation on benchmark datasets | |
| 1. Prepare benchmark datasets: | |
| - For **depth** estimation, you can download the [evaluation datasets (depth)](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/) by the following commands (referred to [Marigold](https://github.com/prs-eth/Marigold?tab=readme-ov-file#-evaluation-on-test-datasets-)): | |
| ``` | |
| cd datasets/eval/depth/ | |
| wget -r -np -nH --cut-dirs=4 -R "index.html*" -P . https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/ | |
| ``` | |
| - For **normal** estimation, you can download the [evaluation datasets (normal)](https://drive.google.com/drive/folders/1t3LMJIIrSnCGwOEf53Cyg0lkSXd3M4Hm?usp=drive_link) (`dsine_eval.zip`) into the path `datasets/eval/normal/` and unzip it (referred to [DSINE](https://github.com/baegwangbin/DSINE?tab=readme-ov-file#getting-started)). | |
| 2. Run the evaluation command: `bash eval_scripts/eval-[task]-[mode].sh`, where `[task]` represents the task name (**depth** or **normal**) and `[mode]` refers to the mode name (**g** or **d**). </br> | |
| 3. (Optional) To reproduce the results presented in our paper, you can set the `--rng_state_path` option in the evaluation command. The RNG state files are available at `./rng_states/`. | |
| ### Choose your model | |
| Below are the released models and their corresponding configurations: | |
| |CHECKPOINT_DIR |TASK_NAME |MODE | | |
| |:--:|:--:|:--:| | |
| | [`jingheya/lotus-depth-g-v1-0`](https://huggingface.co/jingheya/lotus-depth-g-v1-0) | depth| `generation`| | |
| | [`jingheya/lotus-depth-d-v1-0`](https://huggingface.co/jingheya/lotus-depth-d-v1-0) | depth|`regression` | | |
| | [`jingheya/lotus-depth-g-v2-1-disparity`](https://huggingface.co/jingheya/lotus-depth-g-v2-1-disparity) | depth (disparity)| `generation`| | |
| | [`jingheya/lotus-depth-d-v2-0-disparity`](https://huggingface.co/jingheya/lotus-depth-d-v2-0-disparity) | depth (disparity)|`regression` | | |
| | [`jingheya/lotus-normal-g-v1-1`](https://huggingface.co/jingheya/lotus-normal-g-v1-1) |normal | `generation` | | |
| | [`jingheya/lotus-normal-d-v1-1`](https://huggingface.co/jingheya/lotus-normal-d-v1-1) |normal |`regression` | | |
| ## π Citation | |
| If you find our work useful in your research, please consider citing our paper: | |
| ```bibtex | |
| @article{he2024lotus, | |
| title={Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction}, | |
| author={He, Jing and Li, Haodong and Yin, Wei and Liang, Yixun and Li, Leheng and Zhou, Kaiqiang and Liu, Hongbo and Liu, Bingbing and Chen, Ying-Cong}, | |
| journal={arXiv preprint arXiv:2409.18124}, | |
| year={2024} | |
| } | |
| ``` | |