flyingbertman commited on
Commit
19e3ae8
Β·
verified Β·
1 Parent(s): e09d57f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: onnx
4
+ tags:
5
+ - depth-estimation
6
+ - dpt
7
+ - midas
8
+ - onnx
9
+ base_model: Intel/dpt-large
10
+ pipeline_tag: depth-estimation
11
+ language:
12
+ - en
13
  ---
14
+
15
+ # DPT-Large β€” Monocular Depth Estimation (ONNX)
16
+
17
+ ONNX export of [Intel/dpt-large](https://huggingface.co/Intel/dpt-large) β€” the Dense Prediction Transformer for monocular depth. ~330M params, originally published as part of the [MiDaS](https://github.com/isl-org/MiDaS) project at Intel Intelligent Systems Lab.
18
+
19
+ Re-hosted under Heliosoph for distribution stability β€” Intel's published checkpoint is the authoritative source.
20
+
21
+ Credit: Intel ISL (DPT / MiDaS team β€” Ranftl et al.).
22
+
23
+ ## What this repo contains
24
+
25
+ ```
26
+ dpt_large_384.onnx # ~1.3 GB
27
+ ```
28
+
29
+ A single ONNX file. No tokenizer, no preprocessor config β€” preprocessing is fixed by convention.
30
+
31
+ ## Input/output shape
32
+
33
+ | | Spec |
34
+ |---|---|
35
+ | Input name | `pixel_values` (or `image` β€” verify in Netron) |
36
+ | Input shape | `[1, 3, 384, 384]` |
37
+ | Input dtype | float32 |
38
+ | Preprocessing | RGB, divide by 255, normalize by `mean=[0.5, 0.5, 0.5]` / `std=[0.5, 0.5, 0.5]` |
39
+ | Output shape | `[1, 384, 384]` |
40
+ | Output meaning | Relative depth β€” **not** metric. Lower values = farther; higher values = closer. Linearly map to your visualization range. |
41
+
42
+ ## How to use
43
+
44
+ ```python
45
+ import onnxruntime as ort
46
+ import numpy as np
47
+ from PIL import Image
48
+
49
+ sess = ort.InferenceSession("dpt_large_384.onnx")
50
+
51
+ # Resize input image to 384Γ—384, normalize, NCHW
52
+ img = Image.open("photo.jpg").convert("RGB").resize((384, 384))
53
+ arr = (np.asarray(img, dtype=np.float32) / 255.0 - 0.5) / 0.5 # HWC, [-1,1]
54
+ arr = arr.transpose(2, 0, 1)[None, ...] # 1x3x384x384
55
+
56
+ depth = sess.run(None, {sess.get_inputs()[0].name: arr})[0][0] # 384x384
57
+ ```
58
+
59
+ For metric depth, pair with a calibration scheme β€” DPT-Large is trained for relative depth and will not give you "this object is 1.7m away" without further work.
60
+
61
+ ## When to pick DPT-Large
62
+
63
+ - **Quality matters more than speed**: ~330M params, slowest variant in the MiDaS family.
64
+ - **Single static image, not video**: no temporal smoothing built in.
65
+ - **GPU available**: CPU inference is workable but slow (~1–2 sec on consumer CPU).
66
+
67
+ For real-time or edge use, prefer `dpt-hybrid` or `midas-small` β€” not in this repo, but available as separate uploads upstream.
68
+
69
+ ## License
70
+
71
+ **Apache 2.0** β€” same as upstream. `LICENSE` file included.