JonnyYu828
/

DepthVLM-4B

Depth Estimation

image-text-to-text

vision-language-model

Model card Files Files and versions

JonnyYu828 commited on 3 days ago

Commit

13ae84c

·

verified ·

1 Parent(s): bd84f4b

Update README.md

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -15,4 +15,22 @@ tags:
 paper:
   - arxiv: 2605.15876
----

 paper:
   - arxiv: 2605.15876
+---
+Update 2026-05-18 (v1.0): Initial release
+# DepthVLM
+DepthVLM serves as a unified foundation model for both low-level dense geometry prediction and high-level multimodal understanding, while achieving substantially faster inference compared with existing VLM-based approaches such as DepthLM and Youtu-VL.
+## Highlights
+- Native dense metric depth estimation in VLMs
+- Unified multimodal understanding and geometry prediction
+- Full-resolution depth prediction with efficient inference
+- Supports both indoor and outdoor metric depth estimation
+- Improved 3D spatial reasoning capability
+## Paper
+[Unlocking Dense Metric Depth Estimation in VLMs](https://arxiv.org/abs/2605.15876)