Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,71 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
library_name: onnx
|
| 4 |
+
tags:
|
| 5 |
+
- depth-estimation
|
| 6 |
+
- dpt
|
| 7 |
+
- midas
|
| 8 |
+
- onnx
|
| 9 |
+
base_model: Intel/dpt-large
|
| 10 |
+
pipeline_tag: depth-estimation
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
---
|
| 14 |
+
|
| 15 |
+
# DPT-Large β Monocular Depth Estimation (ONNX)
|
| 16 |
+
|
| 17 |
+
ONNX export of [Intel/dpt-large](https://huggingface.co/Intel/dpt-large) β the Dense Prediction Transformer for monocular depth. ~330M params, originally published as part of the [MiDaS](https://github.com/isl-org/MiDaS) project at Intel Intelligent Systems Lab.
|
| 18 |
+
|
| 19 |
+
Re-hosted under Heliosoph for distribution stability β Intel's published checkpoint is the authoritative source.
|
| 20 |
+
|
| 21 |
+
Credit: Intel ISL (DPT / MiDaS team β Ranftl et al.).
|
| 22 |
+
|
| 23 |
+
## What this repo contains
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
dpt_large_384.onnx # ~1.3 GB
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
A single ONNX file. No tokenizer, no preprocessor config β preprocessing is fixed by convention.
|
| 30 |
+
|
| 31 |
+
## Input/output shape
|
| 32 |
+
|
| 33 |
+
| | Spec |
|
| 34 |
+
|---|---|
|
| 35 |
+
| Input name | `pixel_values` (or `image` β verify in Netron) |
|
| 36 |
+
| Input shape | `[1, 3, 384, 384]` |
|
| 37 |
+
| Input dtype | float32 |
|
| 38 |
+
| Preprocessing | RGB, divide by 255, normalize by `mean=[0.5, 0.5, 0.5]` / `std=[0.5, 0.5, 0.5]` |
|
| 39 |
+
| Output shape | `[1, 384, 384]` |
|
| 40 |
+
| Output meaning | Relative depth β **not** metric. Lower values = farther; higher values = closer. Linearly map to your visualization range. |
|
| 41 |
+
|
| 42 |
+
## How to use
|
| 43 |
+
|
| 44 |
+
```python
|
| 45 |
+
import onnxruntime as ort
|
| 46 |
+
import numpy as np
|
| 47 |
+
from PIL import Image
|
| 48 |
+
|
| 49 |
+
sess = ort.InferenceSession("dpt_large_384.onnx")
|
| 50 |
+
|
| 51 |
+
# Resize input image to 384Γ384, normalize, NCHW
|
| 52 |
+
img = Image.open("photo.jpg").convert("RGB").resize((384, 384))
|
| 53 |
+
arr = (np.asarray(img, dtype=np.float32) / 255.0 - 0.5) / 0.5 # HWC, [-1,1]
|
| 54 |
+
arr = arr.transpose(2, 0, 1)[None, ...] # 1x3x384x384
|
| 55 |
+
|
| 56 |
+
depth = sess.run(None, {sess.get_inputs()[0].name: arr})[0][0] # 384x384
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
For metric depth, pair with a calibration scheme β DPT-Large is trained for relative depth and will not give you "this object is 1.7m away" without further work.
|
| 60 |
+
|
| 61 |
+
## When to pick DPT-Large
|
| 62 |
+
|
| 63 |
+
- **Quality matters more than speed**: ~330M params, slowest variant in the MiDaS family.
|
| 64 |
+
- **Single static image, not video**: no temporal smoothing built in.
|
| 65 |
+
- **GPU available**: CPU inference is workable but slow (~1β2 sec on consumer CPU).
|
| 66 |
+
|
| 67 |
+
For real-time or edge use, prefer `dpt-hybrid` or `midas-small` β not in this repo, but available as separate uploads upstream.
|
| 68 |
+
|
| 69 |
+
## License
|
| 70 |
+
|
| 71 |
+
**Apache 2.0** β same as upstream. `LICENSE` file included.
|