Instructions to use zeyuren2002/EvalMDE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use zeyuren2002/EvalMDE with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zeyuren2002/EvalMDE", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Ziyang Song*,
Zerong Wang*,
Bo Li,
Hao Zhang,
Ruijie Zhu,
Li Liu,
Peng-Tao Jiang†,
Tianzhu Zhang†,
*Equal Contribution, †Corresponding Author
University of Science and Technology of China, vivo Mobile Communication Co., Ltd.
TCSVT 2026
We present DepthMaster, a tamed single-step diffusion model that customizes generative features in diffusion models to suit the discriminative depth estimation task. We introduce a Feature Alignment module to mitigate overfitting to texture and a Fourier Enhancement module to refine fine-grained details. DepthMaster exhibits state-of-the-art zero-shot performance and superior detail preservation ability, surpassing other diffusion-based methods across various datasets.
📢 News
2026-04-02: Paper is accepted by TCSVT.
2025-01-15: Evaluation code is released.
2025-01-02: Paper is released on arXiv.
Installation
Please refer to installation.md for installation.
Checkpoint
The model can be downloaded here.
🏃 Testing on your images
📷 Prepare images
Place your images in a directory, for example, under in_the_wild_example/input, and run the following inference command.
bash scripts/infer.sh
You can find all results in in_the_wild_example/output. Enjoy!
🦿 Evaluation on test datasets
Set data directory variable (also needed in evaluation scripts) and download evaluation datasets following Marigold into corresponding subfolders:
export BASE_DATA_DIR=<YOUR_DATA_DIR> # Set target data directory
wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
Download the model here to ckpt/eval subfolder.
Run evaluation scripts, for example:
bash scripts/eval_kitti.sh
The evaluation results will be saved to output\kitti.
🏋️ Training
Modify the data directory in train_s1.sh and train_s2.sh:
BASE_DATA_DIR=YOUR_DATA_DIR # directory of training data
Prepare for Hypersim and Virtual KITTI 2 datasets and save into ${BASE_DATA_DIR} following Marigold.
The fist-stage training
Modify the checkpoint directory in train_s1.sh:
BASE_CKPT_DIR=YOUR_CHECKPOINT_DIR # directory of pretrained checkpoint
Download Stable Diffusion v2 checkpoint into ${BASE_CKPT_DIR}.
Download the checkpoint of Depth-Anything-V2 into checkpoints/
Run training script
bash scripts/train_s1.sh
The second-stage training
Modify the checkpoint directory in train_s2.sh:
BASE_CKPT_DIR=YOUR_FIRST_STAGE_CHECKPOINT_DIR # directory of your fist-stage checkpoint checkpoint
Run training script
bash scripts/train_s2.sh
🎓 Citation
Please cite our paper:
@article{song2026depthmaster,
title={Depthmaster: Taming diffusion models for monocular depth estimation},
author={Song, Ziyang and Wang, Zerong and Li, Bo and Zhang, Hao and Zhu, Ruijie and Liu, Li and Jiang, Peng-Tao and Zhang, Tianzhu},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2026},
publisher={IEEE}
}
Acknowledgements
The code is based on Marigold.
The external encoder checkpoint is from Depth-Anything-V2.
🎫 License
This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).
By downloading and using the code and model you agree to the terms in the LICENSE.
