getmokshshah commited on
Commit
8cc8ac9
·
1 Parent(s): db355c8

Pushed project

Browse files
.github/workflows/sync-to-hf.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ workflow_dispatch:
7
+
8
+ jobs:
9
+ sync-to-hub:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - uses: actions/checkout@v4
13
+ with:
14
+ fetch-depth: 0
15
+ lfs: true
16
+
17
+ - name: Push to HuggingFace Space
18
+ env:
19
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
20
+ run: |
21
+ git push --force https://getmokshshah:$HF_TOKEN@huggingface.co/spaces/getmokshshah/depthlens main
.gitignore ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.egg-info/
6
+ dist/
7
+ build/
8
+ *.egg
9
+
10
+ # Virtual environments
11
+ venv/
12
+ env/
13
+ .env/
14
+
15
+ # IDE
16
+ .vscode/
17
+ .idea/
18
+ *.swp
19
+ *.swo
20
+
21
+ # OS
22
+ .DS_Store
23
+ Thumbs.db
24
+
25
+ # Model weights (downloaded at runtime)
26
+ *.pth
27
+ *.pt
28
+ *.onnx
29
+ hub/
30
+
31
+ # Data
32
+ *.npy
33
+ results/
34
+ outputs/
35
+
36
+ # Example images (downloaded at runtime)
37
+ examples/*.jpg
38
+ examples/*.png
39
+
40
+ # Logs
41
+ *.log
42
+ runs/
LICENSE CHANGED
@@ -1,6 +1,6 @@
1
  MIT License
2
 
3
- Copyright (c) 2026 getmokshshah
4
 
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
  of this software and associated documentation files (the "Software"), to deal
 
1
  MIT License
2
 
3
+ Copyright (c) 2026 Moksh Shah
4
 
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
  of this software and associated documentation files (the "Software"), to deal
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: DepthLens
3
+ emoji: 🌀
4
+ colorFrom: teal
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: "4.44.1"
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # DepthLens — Monocular Depth Estimation
14
+
15
+ Estimate depth from a single image using state-of-the-art deep learning models. Upload any photo and get a detailed depth map showing how far away each part of the scene is.
16
+
17
+ **[Try the Live Demo →](https://huggingface.co/spaces/getmokshshah/depthlens)**
18
+
19
+ ---
20
+
21
+ ## What It Does
22
+
23
+ DepthLens takes a single 2D image and predicts a per-pixel depth map — no stereo cameras, no LiDAR, just one photo. The output is a color-mapped visualization showing relative distances: warm colors (red/yellow) for nearby objects and cool colors (blue/purple) for faraway ones.
24
+
25
+ This is the same core technique used in autonomous vehicles, AR/VR applications, robotics, and 3D scene reconstruction.
26
+
27
+ ## 📁 Project Structure
28
+
29
+ ```
30
+ depthlens/
31
+ ├── app.py # Gradio web app (for HuggingFace Spaces)
32
+ ├── inference.py # Standalone inference script
33
+ ├── requirements.txt # Python dependencies
34
+ ├── models/
35
+ │ └── depth_estimator.py # Model wrapper with MiDaS integration
36
+ ├── utils/
37
+ │ └── visualization.py # Depth map coloring and overlays
38
+ └── examples/ # Sample images for testing
39
+ ```
40
+
41
+ ## Quick Start
42
+
43
+ ### 1. Install Dependencies
44
+
45
+ ```bash
46
+ git clone https://github.com/getmokshshah/depthlens.git
47
+ cd depthlens
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ ### 2. Run the Web App Locally
52
+
53
+ ```bash
54
+ python app.py
55
+ ```
56
+
57
+ Opens a Gradio interface at `http://localhost:7860` where you can upload images and see depth maps.
58
+
59
+ ### 3. Run Inference from the Command Line
60
+
61
+ ```bash
62
+ # Single image
63
+ python inference.py --input photo.jpg --output depth_result.png
64
+
65
+ # Folder of images
66
+ python inference.py --input ./photos/ --output ./results/ --batch
67
+
68
+ # Choose model size
69
+ python inference.py --input photo.jpg --output depth.png --model large
70
+
71
+ # Save raw depth as NumPy array
72
+ python inference.py --input photo.jpg --output depth.npy --save-raw
73
+ ```
74
+
75
+ ## Model Options
76
+
77
+ | Model | Speed (CPU) | Quality | Memory | Best For |
78
+ |-------|-------------|---------|--------|----------|
79
+ | `small` (default) | ~0.5s/image | Good | ~200MB | Real-time apps, demos |
80
+ | `large` | ~3s/image | Best | ~1GB | High-quality results |
81
+
82
+ The **small** model (MiDaS v2.1 Small) is optimized for mobile and edge devices. It runs fast on CPU while producing accurate relative depth maps. The **large** model (DPT-Large) uses a Vision Transformer backbone for maximum accuracy.
83
+
84
+ ## Visualization Modes
85
+
86
+ DepthLens generates multiple visualization styles:
87
+
88
+ - **Colored Depth Map**: A viridis/inferno/magma colormap applied to the depth prediction, producing a striking false-color image
89
+ - **Side-by-Side Comparison**: Original image next to its depth map for easy comparison
90
+ - **Depth Overlay**: Semi-transparent depth map blended on top of the original image
91
+
92
+ ## How It Works
93
+
94
+ 1. **Preprocessing**: The input image is resized and normalized to match the model's expected input format using MiDaS transforms
95
+ 2. **Depth Prediction**: The image passes through a deep neural network (CNN or Vision Transformer) that outputs a per-pixel inverse depth map
96
+ 3. **Normalization**: Raw depth values are normalized to [0, 1] range for visualization
97
+ 4. **Colormap Application**: NumPy and Matplotlib apply scientific colormaps to create visually informative depth images
98
+
99
+ ### Architecture Details
100
+
101
+ The **small** model uses the EfficientNet-Lite backbone with a lightweight decoder, designed for fast inference. The **large** model uses DPT (Dense Prediction Transformer) — a Vision Transformer encoder with convolutional decoder heads that produces sharper depth boundaries and more consistent large-scale depth predictions.
102
+
103
+ ## Understanding the Output
104
+
105
+ - **Warm colors** (red, orange, yellow) → **close** to the camera
106
+ - **Cool colors** (blue, purple) → **far** from the camera
107
+ - **Depth values are relative**, not absolute — the model predicts which parts are closer/farther, not exact distances in meters
108
+
109
+ ## Performance
110
+
111
+ Benchmarked on a 2-core CPU (HuggingFace Spaces free tier):
112
+
113
+ | Model | Resolution | Inference Time | Peak RAM |
114
+ |-------|-----------|----------------|----------|
115
+ | Small | 256×256 | ~0.4s | ~350MB |
116
+ | Small | 512×512 | ~0.8s | ~500MB |
117
+ | Large | 384×384 | ~2.8s | ~1.2GB |
118
+
119
+ ## Configuration
120
+
121
+ ### Inference Options
122
+
123
+ | Argument | Default | Description |
124
+ |----------|---------|-------------|
125
+ | `--input` | required | Path to image or folder |
126
+ | `--output` | required | Output path for results |
127
+ | `--model` | `small` | Model size: `small` or `large` |
128
+ | `--colormap` | `inferno` | Colormap: `inferno`, `magma`, `viridis`, `plasma` |
129
+ | `--side-by-side` | `False` | Generate side-by-side comparison |
130
+ | `--overlay` | `False` | Generate depth overlay on original |
131
+ | `--overlay-alpha` | `0.5` | Transparency for overlay mode |
132
+ | `--save-raw` | `False` | Save raw depth as .npy file |
133
+ | `--batch` | `False` | Process a folder of images |
134
+
135
+ ## Use Cases
136
+
137
+ - **3D Scene Understanding**: Understand spatial layout from a single photo
138
+ - **Autonomous Systems**: Depth perception for robots and drones
139
+ - **AR/VR**: Generate depth data for immersive experiences
140
+ - **Photography**: Create depth-based focus effects (synthetic bokeh)
141
+ - **Accessibility**: Help describe spatial relationships in scenes
142
+
143
+ ## Limitations
144
+
145
+ - Depth is **relative**, not metric — objects are ranked near-to-far but without real-world distances
146
+ - Transparent and reflective surfaces (glass, mirrors, water) can confuse the model
147
+ - Very dark or overexposed regions may have unreliable depth predictions
148
+ - The model performs best on natural outdoor scenes and indoor rooms
149
+
150
+ ## License
151
+
152
+ MIT License — free to use for research or commercial projects.
153
+
154
+ ## Credits
155
+
156
+ - **MiDaS**: Ranftl et al., "Towards Robust Monocular Depth Estimation" (Intel ISL)
157
+ - **DPT**: Ranftl et al., "Vision Transformers for Dense Prediction" (ICCV 2021)
158
+ - **Built with**: PyTorch, OpenCV, Gradio, NumPy, Matplotlib
app.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ DepthLens — Monocular Depth Estimation
3
+ Gradio app for HuggingFace Spaces deployment.
4
+
5
+ Estimates depth from a single image using MiDaS models.
6
+ Optimized for free-tier CPU inference.
7
+ """
8
+
9
+ import time
10
+
11
+ import gradio as gr
12
+ import numpy as np
13
+ from PIL import Image
14
+
15
+ from models import DepthEstimator
16
+ from utils import depth_to_colormap, create_side_by_side, create_overlay
17
+ from download_examples import download_examples
18
+
19
+ # ──────────────────────────────────────────────
20
+ # Download example images if missing
21
+ # ──────────────────────────────────────────────
22
+ download_examples()
23
+
24
+ # ──────────────────────────────────────────────
25
+ # Load model at startup (small for CPU speed)
26
+ # ──────────────────────────────────────────────
27
+ print("Starting DepthLens...")
28
+ estimator = DepthEstimator(model_size="small")
29
+ print("Ready!")
30
+
31
+
32
+ def predict(
33
+ image: Image.Image,
34
+ colormap: str,
35
+ output_mode: str,
36
+ overlay_alpha: float,
37
+ ) -> tuple:
38
+ """
39
+ Run depth estimation and return results.
40
+
41
+ Returns:
42
+ (result_image, depth_colored, stats_string)
43
+ """
44
+ if image is None:
45
+ raise gr.Error("Please upload an image first.")
46
+
47
+ start = time.time()
48
+
49
+ # Run depth estimation
50
+ image_rgb = image.convert("RGB")
51
+ depth = estimator.predict(image_rgb)
52
+
53
+ inference_time = time.time() - start
54
+
55
+ # Create colormapped depth
56
+ depth_colored = depth_to_colormap(depth, colormap.lower())
57
+
58
+ # Create output based on mode
59
+ if output_mode == "Side-by-Side":
60
+ result = create_side_by_side(image_rgb, depth_colored)
61
+ elif output_mode == "Overlay":
62
+ result = create_overlay(image_rgb, depth_colored, alpha=overlay_alpha)
63
+ else:
64
+ result = depth_colored
65
+
66
+ # Stats
67
+ w, h = image_rgb.size
68
+ stats = f"{w}×{h} · {inference_time:.2f}s inference · MiDaS Small"
69
+
70
+ return result, depth_colored, stats
71
+
72
+
73
+ # ──────────────────────────────────────────────
74
+ # Gradio Interface
75
+ # ──────────────────────────────────────────────
76
+ with gr.Blocks(
77
+ title="DepthLens — Monocular Depth Estimation",
78
+ theme=gr.themes.Base(
79
+ primary_hue="teal",
80
+ neutral_hue="slate",
81
+ ),
82
+ ) as demo:
83
+ gr.Markdown(
84
+ """
85
+ # DepthLens — Monocular Depth Estimation
86
+ Upload any image to estimate per-pixel depth using MiDaS.
87
+ Warm colors = close, cool colors = far.
88
+ """
89
+ )
90
+
91
+ with gr.Row():
92
+ with gr.Column(scale=1):
93
+ input_image = gr.Image(type="pil", label="Input Image")
94
+ colormap = gr.Dropdown(
95
+ choices=["Inferno", "Magma", "Viridis", "Plasma"],
96
+ value="Inferno",
97
+ label="Colormap",
98
+ )
99
+ output_mode = gr.Radio(
100
+ choices=["Depth Map", "Side-by-Side", "Overlay"],
101
+ value="Depth Map",
102
+ label="Output Mode",
103
+ )
104
+ overlay_alpha = gr.Slider(
105
+ minimum=0.2, maximum=0.8, value=0.5, step=0.1,
106
+ label="Overlay Opacity",
107
+ visible=False,
108
+ )
109
+ run_btn = gr.Button("Estimate Depth", variant="primary")
110
+ stats = gr.Textbox(label="Info", interactive=False)
111
+
112
+ with gr.Column(scale=1):
113
+ result_image = gr.Image(type="pil", label="Result")
114
+ depth_image = gr.Image(type="pil", label="Depth Map", visible=False)
115
+
116
+ # Show/hide overlay slider
117
+ def toggle_overlay(mode):
118
+ return gr.update(visible=(mode == "Overlay"))
119
+
120
+ output_mode.change(toggle_overlay, output_mode, overlay_alpha)
121
+
122
+ # Run prediction
123
+ run_btn.click(
124
+ fn=predict,
125
+ inputs=[input_image, colormap, output_mode, overlay_alpha],
126
+ outputs=[result_image, depth_image, stats],
127
+ )
128
+
129
+ # Examples
130
+ gr.Examples(
131
+ examples=[
132
+ ["examples/street.jpg", "Inferno", "Side-by-Side", 0.5],
133
+ ["examples/landscape.jpg", "Magma", "Depth Map", 0.5],
134
+ ["examples/indoor.jpg", "Viridis", "Overlay", 0.5],
135
+ ],
136
+ inputs=[input_image, colormap, output_mode, overlay_alpha],
137
+ outputs=[result_image, depth_image, stats],
138
+ fn=predict,
139
+ cache_examples=False,
140
+ )
141
+
142
+
143
+ if __name__ == "__main__":
144
+ demo.launch()
download_examples.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Downloads example images for the DepthLens demo.
3
+ Called automatically by app.py on startup if images are missing.
4
+ Uses Unsplash Source (free, no API key needed).
5
+ """
6
+
7
+ import os
8
+ import urllib.request
9
+
10
+ EXAMPLES_DIR = os.path.join(os.path.dirname(__file__), "examples")
11
+
12
+ EXAMPLE_URLS = {
13
+ "street.jpg": "https://images.unsplash.com/photo-1477959858617-67f85cf4f1df?w=640&q=80",
14
+ "landscape.jpg": "https://images.unsplash.com/photo-1506744038136-46273834b3fb?w=640&q=80",
15
+ "indoor.jpg": "https://images.unsplash.com/photo-1502672260266-1c1ef2d93688?w=640&q=80",
16
+ }
17
+
18
+
19
+ def download_examples():
20
+ """Download example images if they don't already exist."""
21
+ os.makedirs(EXAMPLES_DIR, exist_ok=True)
22
+
23
+ for filename, url in EXAMPLE_URLS.items():
24
+ filepath = os.path.join(EXAMPLES_DIR, filename)
25
+ if os.path.exists(filepath):
26
+ continue
27
+ print(f"Downloading {filename}...")
28
+ try:
29
+ urllib.request.urlretrieve(url, filepath)
30
+ print(f" Saved to {filepath}")
31
+ except Exception as e:
32
+ print(f" Failed to download {filename}: {e}")
33
+
34
+
35
+ if __name__ == "__main__":
36
+ download_examples()
examples/.gitkeep ADDED
File without changes
inference.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Standalone inference script for DepthLens.
3
+
4
+ Usage:
5
+ python inference.py --input photo.jpg --output depth.png
6
+ python inference.py --input ./photos/ --output ./results/ --batch
7
+ python inference.py --input photo.jpg --output depth.png --model large --colormap magma
8
+ """
9
+
10
+ import argparse
11
+ import time
12
+ from pathlib import Path
13
+
14
+ import numpy as np
15
+ from PIL import Image
16
+
17
+ from models import DepthEstimator
18
+ from utils import depth_to_colormap, create_side_by_side, create_overlay
19
+
20
+
21
+ IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".bmp", ".webp", ".tiff"}
22
+
23
+
24
+ def process_single(
25
+ estimator: DepthEstimator,
26
+ input_path: Path,
27
+ output_path: Path,
28
+ colormap: str,
29
+ side_by_side: bool,
30
+ overlay: bool,
31
+ overlay_alpha: float,
32
+ save_raw: bool,
33
+ ):
34
+ """Process a single image and save results."""
35
+ print(f" Processing: {input_path.name}")
36
+ start = time.time()
37
+
38
+ image = Image.open(input_path).convert("RGB")
39
+ depth = estimator.predict(image)
40
+
41
+ elapsed = time.time() - start
42
+ print(f" Inference: {elapsed:.2f}s")
43
+
44
+ # Save colormapped depth
45
+ depth_colored = depth_to_colormap(depth, colormap)
46
+
47
+ if side_by_side:
48
+ result = create_side_by_side(image, depth_colored)
49
+ elif overlay:
50
+ result = create_overlay(image, depth_colored, alpha=overlay_alpha)
51
+ else:
52
+ result = depth_colored
53
+
54
+ # Determine output path
55
+ out = Path(output_path)
56
+ if out.suffix.lower() == ".npy" or save_raw:
57
+ raw_path = out.with_suffix(".npy") if out.suffix else out / (input_path.stem + "_depth.npy")
58
+ np.save(str(raw_path), depth)
59
+ print(f" Saved raw depth: {raw_path}")
60
+
61
+ if out.suffix.lower() != ".npy":
62
+ result.save(str(out))
63
+ print(f" Saved: {out}")
64
+
65
+
66
+ def process_batch(
67
+ estimator: DepthEstimator,
68
+ input_dir: Path,
69
+ output_dir: Path,
70
+ colormap: str,
71
+ side_by_side: bool,
72
+ overlay: bool,
73
+ overlay_alpha: float,
74
+ save_raw: bool,
75
+ ):
76
+ """Process all images in a directory."""
77
+ output_dir.mkdir(parents=True, exist_ok=True)
78
+
79
+ images = sorted(
80
+ p for p in input_dir.iterdir()
81
+ if p.suffix.lower() in IMAGE_EXTENSIONS
82
+ )
83
+
84
+ if not images:
85
+ print(f"No images found in {input_dir}")
86
+ return
87
+
88
+ print(f"Found {len(images)} images in {input_dir}")
89
+ total_start = time.time()
90
+
91
+ for img_path in images:
92
+ out_name = img_path.stem + "_depth.png"
93
+ out_path = output_dir / out_name
94
+ process_single(
95
+ estimator, img_path, out_path,
96
+ colormap, side_by_side, overlay, overlay_alpha, save_raw,
97
+ )
98
+
99
+ total = time.time() - total_start
100
+ avg = total / len(images)
101
+ print(f"\nDone! {len(images)} images in {total:.1f}s (avg {avg:.2f}s/image)")
102
+
103
+
104
+ def main():
105
+ parser = argparse.ArgumentParser(description="DepthLens — Monocular Depth Estimation")
106
+ parser.add_argument("--input", required=True, help="Input image path or directory")
107
+ parser.add_argument("--output", required=True, help="Output path or directory")
108
+ parser.add_argument("--model", default="small", choices=["small", "large"], help="Model size")
109
+ parser.add_argument("--colormap", default="inferno", choices=["inferno", "magma", "viridis", "plasma"])
110
+ parser.add_argument("--side-by-side", action="store_true", help="Generate side-by-side comparison")
111
+ parser.add_argument("--overlay", action="store_true", help="Generate depth overlay on original")
112
+ parser.add_argument("--overlay-alpha", type=float, default=0.5, help="Overlay transparency")
113
+ parser.add_argument("--save-raw", action="store_true", help="Also save raw depth as .npy")
114
+ parser.add_argument("--batch", action="store_true", help="Process a folder of images")
115
+ args = parser.parse_args()
116
+
117
+ estimator = DepthEstimator(model_size=args.model)
118
+
119
+ input_path = Path(args.input)
120
+ output_path = Path(args.output)
121
+
122
+ if args.batch:
123
+ process_batch(
124
+ estimator, input_path, output_path,
125
+ args.colormap, args.side_by_side, args.overlay, args.overlay_alpha, args.save_raw,
126
+ )
127
+ else:
128
+ output_path.parent.mkdir(parents=True, exist_ok=True)
129
+ process_single(
130
+ estimator, input_path, output_path,
131
+ args.colormap, args.side_by_side, args.overlay, args.overlay_alpha, args.save_raw,
132
+ )
133
+
134
+
135
+ if __name__ == "__main__":
136
+ main()
models/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .depth_estimator import DepthEstimator, MODEL_CONFIGS
2
+
3
+ __all__ = ["DepthEstimator", "MODEL_CONFIGS"]
models/depth_estimator.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Depth estimation model wrapper using MiDaS.
3
+
4
+ Supports two model sizes:
5
+ - small: MiDaS v2.1 Small (EfficientNet-Lite backbone, fast CPU inference)
6
+ - large: DPT-Large (Vision Transformer backbone, highest quality)
7
+ """
8
+
9
+ import torch
10
+ import numpy as np
11
+ from PIL import Image
12
+
13
+
14
+ # Model configurations
15
+ MODEL_CONFIGS = {
16
+ "small": {
17
+ "repo": "intel-isl/MiDaS",
18
+ "model_name": "MiDaS_small",
19
+ "transform_name": "small_transform",
20
+ "description": "MiDaS v2.1 Small — Fast CPU inference (~0.5s)",
21
+ },
22
+ "large": {
23
+ "repo": "intel-isl/MiDaS",
24
+ "model_name": "DPT_Large",
25
+ "transform_name": "dpt_transform",
26
+ "description": "DPT-Large — Highest quality depth estimation (~3s)",
27
+ },
28
+ }
29
+
30
+
31
+ class DepthEstimator:
32
+ """Monocular depth estimation using MiDaS models."""
33
+
34
+ def __init__(self, model_size: str = "small", device: str = None):
35
+ """
36
+ Initialize the depth estimator.
37
+
38
+ Args:
39
+ model_size: 'small' or 'large'
40
+ device: 'cpu' or 'cuda' (auto-detected if None)
41
+ """
42
+ if model_size not in MODEL_CONFIGS:
43
+ raise ValueError(f"Unknown model size '{model_size}'. Choose from: {list(MODEL_CONFIGS.keys())}")
44
+
45
+ self.model_size = model_size
46
+ self.config = MODEL_CONFIGS[model_size]
47
+ self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
48
+
49
+ self._load_model()
50
+
51
+ def _load_model(self):
52
+ """Load the MiDaS model and transforms from PyTorch Hub."""
53
+ print(f"Loading {self.config['description']}...")
54
+
55
+ # Load model
56
+ self.model = torch.hub.load(
57
+ self.config["repo"],
58
+ self.config["model_name"],
59
+ trust_repo=True,
60
+ )
61
+ self.model.to(self.device)
62
+ self.model.eval()
63
+
64
+ # Load transforms
65
+ midas_transforms = torch.hub.load(
66
+ self.config["repo"],
67
+ "transforms",
68
+ trust_repo=True,
69
+ )
70
+
71
+ if self.model_size == "small":
72
+ self.transform = midas_transforms.small_transform
73
+ else:
74
+ self.transform = midas_transforms.dpt_transform
75
+
76
+ print(f"Model loaded on {self.device}")
77
+
78
+ @torch.no_grad()
79
+ def predict(self, image: Image.Image) -> np.ndarray:
80
+ """
81
+ Predict depth from a PIL Image.
82
+
83
+ Args:
84
+ image: Input PIL Image (RGB)
85
+
86
+ Returns:
87
+ depth_map: Normalized depth array (H, W) with values in [0, 1].
88
+ Higher values = closer to camera.
89
+ """
90
+ # Convert PIL to numpy RGB
91
+ img_np = np.array(image.convert("RGB"))
92
+
93
+ # Apply MiDaS transform
94
+ input_tensor = self.transform(img_np).to(self.device)
95
+
96
+ # Run inference
97
+ prediction = self.model(input_tensor)
98
+
99
+ # Resize to original dimensions
100
+ prediction = torch.nn.functional.interpolate(
101
+ prediction.unsqueeze(1),
102
+ size=img_np.shape[:2],
103
+ mode="bicubic",
104
+ align_corners=False,
105
+ ).squeeze()
106
+
107
+ depth = prediction.cpu().numpy()
108
+
109
+ # Normalize to [0, 1]
110
+ depth_min = depth.min()
111
+ depth_max = depth.max()
112
+ if depth_max - depth_min > 1e-6:
113
+ depth = (depth - depth_min) / (depth_max - depth_min)
114
+ else:
115
+ depth = np.zeros_like(depth)
116
+
117
+ return depth
118
+
119
+ def __repr__(self):
120
+ return f"DepthEstimator(model_size='{self.model_size}', device='{self.device}')"
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ torch>=1.13.0
2
+ torchvision>=0.14.0
3
+ timm>=0.6.12
4
+ opencv-python-headless>=4.7.0
5
+ Pillow>=9.4.0
6
+ numpy>=1.23.0
7
+ matplotlib>=3.6.0
8
+ gradio>=4.0.0
utils/__init__.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .visualization import (
2
+ depth_to_colormap,
3
+ create_side_by_side,
4
+ create_overlay,
5
+ add_depth_legend,
6
+ COLORMAPS,
7
+ )
8
+
9
+ __all__ = [
10
+ "depth_to_colormap",
11
+ "create_side_by_side",
12
+ "create_overlay",
13
+ "add_depth_legend",
14
+ "COLORMAPS",
15
+ ]
utils/visualization.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Visualization utilities for depth map rendering.
3
+
4
+ Provides colormapped depth images, side-by-side comparisons, and overlay modes.
5
+ """
6
+
7
+ import numpy as np
8
+ from PIL import Image
9
+ import matplotlib
10
+ import matplotlib.cm as cm
11
+
12
+
13
+ # Available colormaps for depth visualization
14
+ COLORMAPS = {
15
+ "inferno": cm.inferno,
16
+ "magma": cm.magma,
17
+ "viridis": cm.viridis,
18
+ "plasma": cm.plasma,
19
+ }
20
+
21
+
22
+ def depth_to_colormap(depth: np.ndarray, colormap: str = "inferno") -> Image.Image:
23
+ """
24
+ Apply a scientific colormap to a normalized depth array.
25
+
26
+ Args:
27
+ depth: Normalized depth array (H, W), values in [0, 1]
28
+ colormap: Name of colormap ('inferno', 'magma', 'viridis', 'plasma')
29
+
30
+ Returns:
31
+ Colormapped PIL Image (RGB)
32
+ """
33
+ if colormap not in COLORMAPS:
34
+ raise ValueError(f"Unknown colormap '{colormap}'. Choose from: {list(COLORMAPS.keys())}")
35
+
36
+ cmap = COLORMAPS[colormap]
37
+ colored = cmap(depth) # Returns (H, W, 4) RGBA float array
38
+ colored = (colored[:, :, :3] * 255).astype(np.uint8) # Drop alpha, convert to uint8
39
+
40
+ return Image.fromarray(colored)
41
+
42
+
43
+ def create_side_by_side(
44
+ original: Image.Image,
45
+ depth_colored: Image.Image,
46
+ gap: int = 4,
47
+ bg_color: tuple = (20, 24, 30),
48
+ ) -> Image.Image:
49
+ """
50
+ Create a side-by-side comparison of original image and depth map.
51
+
52
+ Args:
53
+ original: Original PIL Image
54
+ depth_colored: Colormapped depth PIL Image
55
+ gap: Pixel gap between images
56
+ bg_color: Background color for the gap
57
+
58
+ Returns:
59
+ Combined PIL Image
60
+ """
61
+ # Resize depth to match original dimensions
62
+ depth_resized = depth_colored.resize(original.size, Image.LANCZOS)
63
+
64
+ w, h = original.size
65
+ canvas = Image.new("RGB", (w * 2 + gap, h), bg_color)
66
+ canvas.paste(original, (0, 0))
67
+ canvas.paste(depth_resized, (w + gap, 0))
68
+
69
+ return canvas
70
+
71
+
72
+ def create_overlay(
73
+ original: Image.Image,
74
+ depth_colored: Image.Image,
75
+ alpha: float = 0.5,
76
+ ) -> Image.Image:
77
+ """
78
+ Blend the depth map on top of the original image.
79
+
80
+ Args:
81
+ original: Original PIL Image
82
+ depth_colored: Colormapped depth PIL Image
83
+ alpha: Blend factor (0 = only original, 1 = only depth)
84
+
85
+ Returns:
86
+ Blended PIL Image
87
+ """
88
+ depth_resized = depth_colored.resize(original.size, Image.LANCZOS)
89
+ original_rgb = original.convert("RGB")
90
+
91
+ blended = Image.blend(original_rgb, depth_resized, alpha)
92
+ return blended
93
+
94
+
95
+ def add_depth_legend(
96
+ image: Image.Image,
97
+ colormap: str = "inferno",
98
+ bar_height: int = 24,
99
+ padding: int = 12,
100
+ ) -> Image.Image:
101
+ """
102
+ Add a color legend bar at the bottom of the image.
103
+
104
+ Args:
105
+ image: Input PIL Image
106
+ colormap: Colormap name for the legend
107
+ bar_height: Height of the legend bar
108
+ padding: Padding around the bar
109
+
110
+ Returns:
111
+ Image with legend appended at the bottom
112
+ """
113
+ w, h = image.size
114
+ total_height = h + bar_height + padding * 2
115
+
116
+ canvas = Image.new("RGB", (w, total_height), (20, 24, 30))
117
+ canvas.paste(image, (0, 0))
118
+
119
+ # Create gradient bar
120
+ cmap = COLORMAPS.get(colormap, cm.inferno)
121
+ gradient = np.linspace(0, 1, w - padding * 2).reshape(1, -1)
122
+ gradient = np.repeat(gradient, bar_height, axis=0)
123
+ colored = cmap(gradient)[:, :, :3]
124
+ colored = (colored * 255).astype(np.uint8)
125
+ bar_img = Image.fromarray(colored)
126
+
127
+ canvas.paste(bar_img, (padding, h + padding))
128
+
129
+ return canvas