DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Ziyang Song*, Zerong Wang*, Bo Li, Hao Zhang, Ruijie Zhu, Li Liu, Peng-Tao Jiang†, Tianzhu Zhang†,
*Equal Contribution, †Corresponding Author
University of Science and Technology of China, vivo Mobile Communication Co., Ltd.
TCSVT 2026

                       
![teaser](assets/framework.png) >We present DepthMaster, a tamed single-step diffusion model that customizes generative features in diffusion models to suit the discriminative depth estimation task. We introduce a Feature Alignment module to mitigate overfitting to texture and a Fourier Enhancement module to refine fine-grained details. DepthMaster exhibits state-of-the-art zero-shot performance and superior detail preservation ability, surpassing other diffusion-based methods across various datasets. ## 📢 News 2026-04-02: [Paper](https://ieeexplore.ieee.org/document/11475488) is accepted by TCSVT.
2025-01-15: Evaluation code is released.
2025-01-02: [Paper](https://arxiv.org/abs/2501.02576) is released on arXiv.
## Installation Please refer to [installation.md](./docs/installation.md) for installation. ## Checkpoint The model can be downloaded [here](https://huggingface.co/zysong212/DepthMaster). ## 🏃 Testing on your images ### 📷 Prepare images Place your images in a directory, for example, under `in_the_wild_example/input`, and run the following inference command. ```bash bash scripts/infer.sh ``` You can find all results in `in_the_wild_example/output`. Enjoy! ## 🦿 Evaluation on test datasets Set data directory variable (also needed in evaluation scripts) and download [evaluation datasets](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset) following [Marigold](https://github.com/prs-eth/Marigold) into corresponding subfolders: ```bash export BASE_DATA_DIR= # Set target data directory wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/ ``` Download the model [here](https://huggingface.co/zysong212/DepthMaster) to `ckpt/eval` subfolder. Run evaluation scripts, for example: ```bash bash scripts/eval_kitti.sh ``` The evaluation results will be saved to `output\kitti`. ## 🏋️ Training Modify the data directory in `train_s1.sh` and `train_s2.sh`: ```bash BASE_DATA_DIR=YOUR_DATA_DIR # directory of training data ``` Prepare for [Hypersim](https://github.com/apple/ml-hypersim) and [Virtual KITTI 2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/) datasets and save into `${BASE_DATA_DIR}` following [Marigold](https://github.com/prs-eth/Marigold?tab=readme-ov-file). ### The fist-stage training Modify the checkpoint directory in `train_s1.sh`: ```bash BASE_CKPT_DIR=YOUR_CHECKPOINT_DIR # directory of pretrained checkpoint ``` Download Stable Diffusion v2 [checkpoint](https://huggingface.co/stabilityai/stable-diffusion-2) into `${BASE_CKPT_DIR}`. \ Download the checkpoint of [Depth-Anything-V2](https://github.com/DepthAnything/Depth-Anything-V2) into `checkpoints/` Run training script ```bash bash scripts/train_s1.sh ``` ### The second-stage training Modify the checkpoint directory in `train_s2.sh`: ```bash BASE_CKPT_DIR=YOUR_FIRST_STAGE_CHECKPOINT_DIR # directory of your fist-stage checkpoint checkpoint ``` Run training script ```bash bash scripts/train_s2.sh ``` ## 🎓 Citation Please cite our paper: ```bibtex @article{song2026depthmaster, title={Depthmaster: Taming diffusion models for monocular depth estimation}, author={Song, Ziyang and Wang, Zerong and Li, Bo and Zhang, Hao and Zhu, Ruijie and Liu, Li and Jiang, Peng-Tao and Zhang, Tianzhu}, journal={IEEE Transactions on Circuits and Systems for Video Technology}, year={2026}, publisher={IEEE} } ``` ## Acknowledgements The code is based on [Marigold](https://github.com/prs-eth/Marigold). \ The external encoder checkpoint is from [Depth-Anything-V2](https://github.com/DepthAnything/Depth-Anything-V2). ## 🎫 License This work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](LICENSE.txt)). By downloading and using the code and model you agree to the terms in the [LICENSE](LICENSE.txt). [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)