Instructions to use zeyuren2002/EvalMDE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use zeyuren2002/EvalMDE with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zeyuren2002/EvalMDE", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| <div align="center"> | |
| <h1 style="border-bottom: none; margin-bottom: 0px ">Depth Anything 3: Recovering the Visual Space from Any Views</h1> | |
| <!-- <h2 style="border-top: none; margin-top: 3px;">Recovering the Visual Space from Any Views</h2> --> | |
| [**Haotong Lin**](https://haotongl.github.io/)<sup>*</sup> Β· [**Sili Chen**](https://github.com/SiliChen321)<sup>*</sup> Β· [**Jun Hao Liew**](https://liewjunhao.github.io/)<sup>*</sup> Β· [**Donny Y. Chen**](https://donydchen.github.io)<sup>*</sup> Β· [**Zhenyu Li**](https://zhyever.github.io/) Β· [**Guang Shi**](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) Β· [**Jiashi Feng**](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) | |
| <br> | |
| [**Bingyi Kang**](https://bingyikang.com/)<sup>*†</sup> | |
| †project lead *Equal Contribution | |
| <a href="https://arxiv.org/abs/2511.10647"><img src='https://img.shields.io/badge/arXiv-Depth Anything 3-red' alt='Paper PDF'></a> | |
| <a href='https://depth-anything-3.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything 3-green' alt='Project Page'></a> | |
| <a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-3'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> | |
| <!-- <a href='https://huggingface.co/datasets/depth-anything/VGB'><img src='https://img.shields.io/badge/Benchmark-VisGeo-yellow' alt='Benchmark'></a> --> | |
| <!-- <a href='https://huggingface.co/datasets/depth-anything/data'><img src='https://img.shields.io/badge/Benchmark-xxx-yellow' alt='Data'></a> --> | |
| </div> | |
| This work presents **Depth Anything 3 (DA3)**, a model that predicts spatially consistent geometry from | |
| arbitrary visual inputs, with or without known camera poses. | |
| In pursuit of minimal modeling, DA3 yields two key insights: | |
| - π A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, | |
| - β¨ A singular **depth-ray representation** obviates the need for complex multi-task learning. | |
| π DA3 significantly outperforms | |
| [DA2](https://github.com/DepthAnything/Depth-Anything-V2) for monocular depth estimation, | |
| and [VGGT](https://github.com/facebookresearch/vggt) for multi-view depth estimation and pose estimation. | |
| All models are trained exclusively on **public academic datasets**. | |
| <!-- <p align="center"> | |
| <img src="assets/images/da3_teaser.png" alt="Depth Anything 3" width="100%"> | |
| </p> --> | |
| <p align="center"> | |
| <img src="assets/images/demo320-2.gif" alt="Depth Anything 3 - Left" width="70%"> | |
| </p> | |
| <p align="center"> | |
| <img src="assets/images/da3_radar.png" alt="Depth Anything 3" width="100%"> | |
| </p> | |
| ## π° News | |
| - **11-12-2025:** π New models and [**DA3-Streaming**](da3_streaming/README.md) released! Handle ultra-long video sequence inference with less than 12GB GPU memory via sliding-window streaming inference. Special thanks to [Kai Deng](https://github.com/DengKaiCQ) for his contribution to DA3-Streaming! | |
| - **08-12-2025:** π [Benchmark evaluation pipeline](docs/BENCHMARK.md) released! Evaluate pose estimation & 3D reconstruction on 5 datasets. | |
| - **30-11-2025:** Add [`use_ray_pose`](#use-ray-pose) and [`ref_view_strategy`](docs/funcs/ref_view_strategy.md) (reference view selection for multi-view inputs). | |
| - **25-11-2025:** Add [Awesome DA3 Projects](#-awesome-da3-projects), a community-driven section featuring DA3-based applications. | |
| - **14-11-2025:** Paper, project page, code and models are all released. | |
| ## β¨ Highlights | |
| ### π Model Zoo | |
| We release three series of models, each tailored for specific use cases in visual geometry. | |
| - π **DA3 Main Series** (`DA3-Giant`, `DA3-Large`, `DA3-Base`, `DA3-Small`) These are our flagship foundation models, trained with a unified depth-ray representation. By varying the input configuration, a single model can perform a wide range of tasks: | |
| + π **Monocular Depth Estimation**: Predicts a depth map from a single RGB image. | |
| + π **Multi-View Depth Estimation**: Generates consistent depth maps from multiple images for high-quality fusion. | |
| + π― **Pose-Conditioned Depth Estimation**: Achieves superior depth consistency when camera poses are provided as input. | |
| + π· **Camera Pose Estimation**: Estimates camera extrinsics and intrinsics from one or more images. | |
| + π‘ **3D Gaussian Estimation**: Directly predicts 3D Gaussians, enabling high-fidelity novel view synthesis. | |
| - π **DA3 Metric Series** (`DA3Metric-Large`) A specialized model fine-tuned for metric depth estimation in monocular settings, ideal for applications requiring real-world scale. | |
| - π **DA3 Monocular Series** (`DA3Mono-Large`). A dedicated model for high-quality relative monocular depth estimation. Unlike disparity-based models (e.g., [Depth Anything 2](https://github.com/DepthAnything/Depth-Anything-V2)), it directly predicts depth, resulting in superior geometric accuracy. | |
| π Leveraging these available models, we developed a **nested series** (`DA3Nested-Giant-Large`). This series combines a any-view giant model with a metric model to reconstruct visual geometry at a real-world metric scale. | |
| ### π οΈ Codebase Features | |
| Our repository is designed to be a powerful and user-friendly toolkit for both practical application and future research. | |
| - π¨ **Interactive Web UI & Gallery**: Visualize model outputs and compare results with an easy-to-use Gradio-based web interface. | |
| - β‘ **Flexible Command-Line Interface (CLI)**: Powerful and scriptable CLI for batch processing and integration into custom workflows. | |
| - πΎ **Multiple Export Formats**: Save your results in various formats, including `glb`, `npz`, depth images, `ply`, 3DGS videos, etc, to seamlessly connect with other tools. | |
| - π§ **Extensible and Modular Design**: The codebase is structured to facilitate future research and the integration of new models or functionalities. | |
| <!-- ### π― Visual Geometry Benchmark | |
| We introduce a new benchmark to rigorously evaluate geometry prediction models on three key tasks: pose estimation, 3D reconstruction, and visual rendering (novel view synthesis) quality. | |
| - π **Broad Model Compatibility**: Our benchmark is designed to be versatile, supporting the evaluation of various models, including both monocular and multi-view depth estimation approaches. | |
| - π¬ **Robust Evaluation Pipeline**: We provide a standardized pipeline featuring RANSAC-based pose alignment, TSDF fusion for dense reconstruction, and a principled view selection strategy for novel view synthesis. | |
| - π **Standardized Metrics**: Performance is measured using established metrics: AUC for pose accuracy, F1-score and Chamfer Distance for reconstruction, and PSNR/SSIM/LPIPS for rendering quality. | |
| - π **Diverse and Challenging Datasets**: The benchmark spans a wide range of scenes from datasets like HiRoom, ETH3D, DTU, 7Scenes, ScanNet++, DL3DV, Tanks and Temples, and MegaDepth. --> | |
| ## π Quick Start | |
| ### π¦ Installation | |
| ```bash | |
| pip install xformers torch\>=2 torchvision | |
| pip install -e . # Basic | |
| pip install --no-build-isolation git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70 # for gaussian head | |
| pip install -e ".[app]" # Gradio, python>=3.10 | |
| pip install -e ".[all]" # ALL | |
| ``` | |
| For detailed model information, please refer to the [Model Cards](#-model-cards) section below. | |
| ### π» Basic Usage | |
| ```python | |
| import glob, os, torch | |
| from depth_anything_3.api import DepthAnything3 | |
| device = torch.device("cuda") | |
| model = DepthAnything3.from_pretrained("depth-anything/DA3NESTED-GIANT-LARGE") | |
| model = model.to(device=device) | |
| example_path = "assets/examples/SOH" | |
| images = sorted(glob.glob(os.path.join(example_path, "*.png"))) | |
| prediction = model.inference( | |
| images, | |
| ) | |
| # prediction.processed_images : [N, H, W, 3] uint8 array | |
| print(prediction.processed_images.shape) | |
| # prediction.depth : [N, H, W] float32 array | |
| print(prediction.depth.shape) | |
| # prediction.conf : [N, H, W] float32 array | |
| print(prediction.conf.shape) | |
| # prediction.extrinsics : [N, 3, 4] float32 array # opencv w2c or colmap format | |
| print(prediction.extrinsics.shape) | |
| # prediction.intrinsics : [N, 3, 3] float32 array | |
| print(prediction.intrinsics.shape) | |
| ``` | |
| ```bash | |
| export MODEL_DIR=depth-anything/DA3NESTED-GIANT-LARGE | |
| # This can be a Hugging Face repository or a local directory | |
| # If you encounter network issues, consider using the following mirror: export HF_ENDPOINT=https://hf-mirror.com | |
| # Alternatively, you can download the model directly from Hugging Face | |
| export GALLERY_DIR=workspace/gallery | |
| mkdir -p $GALLERY_DIR | |
| # CLI auto mode with backend reuse | |
| da3 backend --model-dir ${MODEL_DIR} --gallery-dir ${GALLERY_DIR} # Cache model to gpu | |
| da3 auto assets/examples/SOH \ | |
| --export-format glb \ | |
| --export-dir ${GALLERY_DIR}/TEST_BACKEND/SOH \ | |
| --use-backend | |
| # CLI video processing with feature visualization | |
| da3 video assets/examples/robot_unitree.mp4 \ | |
| --fps 15 \ | |
| --use-backend \ | |
| --export-dir ${GALLERY_DIR}/TEST_BACKEND/robo \ | |
| --export-format glb-feat_vis \ | |
| --feat-vis-fps 15 \ | |
| --process-res-method lower_bound_resize \ | |
| --export-feat "11,21,31" | |
| # CLI auto mode without backend reuse | |
| da3 auto assets/examples/SOH \ | |
| --export-format glb \ | |
| --export-dir ${GALLERY_DIR}/TEST_CLI/SOH \ | |
| --model-dir ${MODEL_DIR} | |
| ``` | |
| The model architecture is defined in [`DepthAnything3Net`](src/depth_anything_3/model/da3.py), and specified with a Yaml config file located at [`src/depth_anything_3/configs`](src/depth_anything_3/configs). The input and output processing are handled by [`DepthAnything3`](src/depth_anything_3/api.py). To customize the model architecture, simply create a new config file (*e.g.*, `path/to/new/config`) as: | |
| ```yaml | |
| __object__: | |
| path: depth_anything_3.model.da3 | |
| name: DepthAnything3Net | |
| args: as_params | |
| net: | |
| __object__: | |
| path: depth_anything_3.model.dinov2.dinov2 | |
| name: DinoV2 | |
| args: as_params | |
| name: vitb | |
| out_layers: [5, 7, 9, 11] | |
| alt_start: 4 | |
| qknorm_start: 4 | |
| rope_start: 4 | |
| cat_token: True | |
| head: | |
| __object__: | |
| path: depth_anything_3.model.dualdpt | |
| name: DualDPT | |
| args: as_params | |
| dim_in: &head_dim_in 1536 | |
| output_dim: 2 | |
| features: &head_features 128 | |
| out_channels: &head_out_channels [96, 192, 384, 768] | |
| ``` | |
| Then, the model can be created with the following code snippet. | |
| ```python | |
| from depth_anything_3.cfg import create_object, load_config | |
| Model = create_object(load_config("path/to/new/config")) | |
| ``` | |
| ## π Useful Documentation | |
| - π₯οΈ [Command Line Interface](docs/CLI.md) | |
| - π [Python API](docs/API.md) | |
| - π [Benchmark Evaluation](docs/BENCHMARK.md) | |
| ## ποΈ Model Cards | |
| Generally, you should observe that DA3-LARGE achieves comparable results to VGGT. | |
| The Nested series uses an Any-view model to estimate pose and depth, and a monocular metric depth estimator for scaling. | |
| β οΈ Models with the `-1.1` suffix are retrained after fixing a training bug; prefer these refreshed checkpoints. The original `DA3NESTED-GIANT-LARGE`, `DA3-GIANT`, and `DA3-LARGE` remain available but are deprecated. You could expect much better performance for street scenes with the `-1.1` models. | |
| | ποΈ Model Name | π Params | π Rel. Depth | π· Pose Est. | π§ Pose Cond. | π¨ GS | π Met. Depth | βοΈ Sky Seg | π License | | |
| |-------------------------------|-----------|---------------|--------------|---------------|-------|---------------|-----------|----------------| | |
| | **Nested** | | | | | | | | | | |
| | [DA3NESTED-GIANT-LARGE-1.1](https://huggingface.co/depth-anything/DA3NESTED-GIANT-LARGE-1.1) | 1.40B | β | β | β | β | β | β | CC BY-NC 4.0 | | |
| | [DA3NESTED-GIANT-LARGE](https://huggingface.co/depth-anything/DA3NESTED-GIANT-LARGE) | 1.40B | β | β | β | β | β | β | CC BY-NC 4.0 | | |
| | **Any-view Model** | | | | | | | | | | |
| | [DA3-GIANT-1.1](https://huggingface.co/depth-anything/DA3-GIANT-1.1) | 1.15B | β | β | β | β | | | CC BY-NC 4.0 | | |
| | [DA3-GIANT](https://huggingface.co/depth-anything/DA3-GIANT) | 1.15B | β | β | β | β | | | CC BY-NC 4.0 | | |
| | [DA3-LARGE-1.1](https://huggingface.co/depth-anything/DA3-LARGE-1.1) | 0.35B | β | β | β | | | | CC BY-NC 4.0 | | |
| | [DA3-LARGE](https://huggingface.co/depth-anything/DA3-LARGE) | 0.35B | β | β | β | | | | CC BY-NC 4.0 | | |
| | [DA3-BASE](https://huggingface.co/depth-anything/DA3-BASE) | 0.12B | β | β | β | | | | Apache 2.0 | | |
| | [DA3-SMALL](https://huggingface.co/depth-anything/DA3-SMALL) | 0.08B | β | β | β | | | | Apache 2.0 | | |
| | | | | | | | | | | | |
| | **Monocular Metric Depth** | | | | | | | | | | |
| | [DA3METRIC-LARGE](https://huggingface.co/depth-anything/DA3METRIC-LARGE) | 0.35B | β | | | | β | β | Apache 2.0 | | |
| | | | | | | | | | | | |
| | **Monocular Depth** | | | | | | | | | | |
| | [DA3MONO-LARGE](https://huggingface.co/depth-anything/DA3MONO-LARGE) | 0.35B | β | | | | | β | Apache 2.0 | | |
| ## β FAQ | |
| - **Monocular Metric Depth**: To obtain metric depth in meters from `DA3METRIC-LARGE`, use `metric_depth = focal * net_output / 300.`, where `focal` is the focal length in pixels (typically the average of fx and fy from the camera intrinsic matrix K). Note that the output from `DA3NESTED-GIANT-LARGE` is already in meters. | |
| - <a id="use-ray-pose"></a>**Ray Head (`use_ray_pose`)**: Our API and CLI support `use_ray_pose` arg, which means that the model will derive camera pose from ray head, which is generally slightly slower, but more accurate. Note that the default is `False` for faster inference speed. | |
| <details> | |
| <summary>AUC3 Results for DA3NESTED-GIANT-LARGE</summary> | |
| | Model | HiRoom | ETH3D | DTU | 7Scenes | ScanNet++ | | |
| |-------|------|-------|-----|---------|-----------| | |
| | `ray_head` | 84.4 | 52.6 | 93.9 | 29.5 | 89.4 | | |
| | `cam_head` | 80.3 | 48.4 | 94.1 | 28.5 | 85.0 | | |
| </details> | |
| - **Older GPUs without XFormers support**: See [Issue #11](https://github.com/ByteDance-Seed/Depth-Anything-3/issues/11). Thanks to [@S-Mahoney](https://github.com/S-Mahoney) for the solution! | |
| ## π’ Awesome DA3 Projects | |
| A community-curated list of Depth Anything 3 integrations across 3D tools, creative pipelines, robotics, and web/VR viewers, including but not limited to these. You are welcome to submit your DA3-based project via PR, and we will review and feature it if applicable. | |
| - [DA3-blender](https://github.com/xy-gao/DA3-blender): Blender addon for DA3-based 3D reconstruction from a set of images. | |
| - [ComfyUI-DepthAnythingV3](https://github.com/PozzettiAndrea/ComfyUI-DepthAnythingV3): ComfyUI nodes for Depth Anything 3, supporting single/multi-view and video-consistent depth with optional pointβcloud export. | |
| - [DA3-ROS2-Wrapper](https://github.com/GerdsenAI/GerdsenAI-Depth-Anything-3-ROS2-Wrapper): Real-time DA3 depth in ROS2 with multi-camera support. | |
| - [DA3-ROS2-CPP-TensorRT](https://github.com/ika-rwth-aachen/ros2-depth-anything-v3-trt): DA3 ROS2 C++ TensorRT Inference Node: a ROS2 node for DA3 depth estimation using TensorRT for real-time inference. | |
| - [VideoDepthViewer3D](https://github.com/amariichi/VideoDepthViewer3D): Streaming videos with DA3 metric depth to a Three.js/WebXR 3D viewer for VR/stereo playback. | |
| ## π§βπ» Official Codebase Core Contributors and Maintainers | |
| <table> | |
| <tr> | |
| <td align="center"> | |
| <a href="https://bingykang.github.io/"> | |
| <img src="https://images.weserv.nl/?url=https://bingykang.github.io/images/bykang_homepage.jpeg?h=100&w=100&fit=cover&mask=circle&maxage=7d" width="100px;" alt=""/> | |
| </a> | |
| <br /> | |
| <sub><b>Bingyi Kang</b></sub> | |
| </td> | |
| <td align="center"> | |
| <a href="https://haotongl.github.io/"> | |
| <img src="https://images.weserv.nl/?url=https://haotongl.github.io/assets/img/prof_pic.jpg?h=100&w=100&fit=cover&mask=circle&maxage=7d" width="100px;" alt=""/> | |
| </a> | |
| <br /> | |
| <sub>Haotong Lin</sub> | |
| </td> | |
| <td align="center"> | |
| <a href="https://github.com/SiliChen321"> | |
| <img src="https://images.weserv.nl/?url=https://avatars.githubusercontent.com/u/195901058?v=4&h=100&w=100&fit=cover&mask=circle&maxage=7d" width="100px;" alt=""/> | |
| </a> | |
| <br /> | |
| <sub>Sili Chen</sub> | |
| </td> | |
| <td align="center"> | |
| <a href="https://liewjunhao.github.io/"> | |
| <img src="https://images.weserv.nl/?url=https://liewjunhao.github.io/images/liewjunhao.png?h=100&w=100&fit=cover&mask=circle&maxage=7d" width="100px;" alt=""/> | |
| </a> | |
| <br /> | |
| <sub>Jun Hao Liew</sub> | |
| </td> | |
| <td align="center"> | |
| <a href="https://donydchen.github.io/"> | |
| <img src="https://images.weserv.nl/?url=https://donydchen.github.io/assets/img/profile.jpg?h=100&w=100&fit=cover&mask=circle&maxage=7d" width="100px;" alt=""/> | |
| </a> | |
| <br /> | |
| <sub>Donny Y. Chen</sub> | |
| </td> | |
| <td align="center"> | |
| <a href="https://github.com/DengKaiCQ"> | |
| <img src="https://images.weserv.nl/?url=https://avatars.githubusercontent.com/u/59907452?v=4&h=100&w=100&fit=cover&mask=circle&maxage=7d" width="100px;" alt=""/> | |
| </a> | |
| <br /> | |
| <sub>Kai Deng</sub> | |
| </td> | |
| </tr> | |
| </table> | |
| ## π Citations | |
| If you find Depth Anything 3 useful in your research or projects, please cite our work: | |
| ``` | |
| @article{depthanything3, | |
| title={Depth Anything 3: Recovering the visual space from any views}, | |
| author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, | |
| journal={arXiv preprint arXiv:2511.10647}, | |
| year={2025} | |
| } | |
| ``` | |