--- license: apache-2.0 --- # Depth Anything V2 Estimator Block A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block for monocular depth estimation using [Depth Anything V2](https://huggingface.co/depth-anything/Depth-Anything-V2-Large-hf). Supports both images and videos. ## Features - **Relative depth estimation** using Depth Anything V2 (Large variant, 335M params) - **Image and video** input support - **Grayscale or turbo colormap** visualization ## Installation ```bash # Using uv uv sync # Using pip pip install -r requirements.txt ``` ## Quick Start ### Load the block ```python from diffusers import ModularPipelineBlocks import torch blocks = ModularPipelineBlocks.from_pretrained( "your-username/depth-anything-v2-estimator", # or local path "." trust_remote_code=True, ) pipeline = blocks.init_pipeline() pipeline.load_components(torch_dtype=torch.float16) pipeline.to("cuda") ``` ### Single image - grayscale depth ```python from PIL import Image image = Image.open("photo.jpg") output = pipeline(image=image) # Save depth map output.depth_image.save("photo_depth.png") # Access raw relative depth tensor print(output.predicted_depth.shape) # (H, W) ``` ### Single image - turbo colormap ```python output = pipeline(image=image, colormap="turbo") output.depth_image.save("photo_depth_turbo.png") ``` ### Video - grayscale depth ```python from block import save_video output = pipeline(video_path="input.mp4", colormap="grayscale") save_video(output.depth_frames, output.fps, "output_depth.mp4") ``` ### Video - turbo colormap ```python output = pipeline(video_path="input.mp4", colormap="turbo") save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4") ``` ## Inputs | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `image` | `PIL.Image` | - | Image to estimate depth for | | `video_path` | `str` | - | Path to input video. When provided, `image` is ignored | | `colormap` | `str` | `"grayscale"` | `"grayscale"` or `"turbo"` (colormapped) | ## Outputs ### Image mode | Output | Type | Description | |--------|------|-------------| | `depth_image` | `PIL.Image` | Normalized depth visualization | | `predicted_depth` | `torch.Tensor` | Raw relative depth (H x W) | ### Video mode | Output | Type | Description | |--------|------|-------------| | `depth_frames` | `List[PIL.Image]` | Per-frame depth visualizations | | `fps` | `float` | Source video frame rate | ## Depth Normalization Depth values are min-max normalized and inverted so that bright areas represent nearby surfaces and dark areas represent distant ones. - **Bright = close**, **dark = far** (grayscale) - **Warm (red/yellow) = close**, **cool (blue) = far** (turbo) ## Model Variants The block defaults to `depth-anything/Depth-Anything-V2-Large-hf`. Other available variants: | Variant | Model ID | Params | |---------|----------|--------| | Small | `depth-anything/Depth-Anything-V2-Small-hf` | 24.8M | | Base | `depth-anything/Depth-Anything-V2-Base-hf` | 97.5M | | **Large** (default) | `depth-anything/Depth-Anything-V2-Large-hf` | 335M |