Instructions to use zeyuren2002/EvalMDE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use zeyuren2002/EvalMDE with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zeyuren2002/EvalMDE", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| # Phase 0 EvalMDE Adaptation — Handoff | |
| **Date:** 2026-05-14 | |
| **Status:** EvalMDE workspace bootstrapped; main eval script + sbatch still to write. | |
| --- | |
| ## Goal | |
| Run the 7 MoGe-Phase-0 models on **Infinigen 95 scenes** under the **EvalMDE protocol** | |
| (raw native input, no homography warp), producing **RelNormal + SAWA-H + standard metrics**. | |
| EvalMDE and MoGe are independent workflows. EvalMDE workspace is at `/home/ywan0794/EvalMDE/`. | |
| Model wrappers are *copied* from MoGe (single source of truth still in MoGe/baselines/), | |
| because the wrappers' `infer(image, intrinsics)` API doesn't depend on MoGe's eval pipeline. | |
| --- | |
| ## What's done | |
| ### 1. EvalMDE env (Python 3.10) — built and verified | |
| `evalmde` conda env has: torch 2.7.0+cu126, opencv, scipy, utils3d, pipeline, evalmde package, | |
| bpy 4.0 (Blender python, for textureless-relighting visualization). | |
| Sample run `python compute_metrics_example.py` outputs `sawa_h=1.268, rel_normal=0.390` ✓. | |
| ### 2. 7 baselines (model wrappers) — copied from MoGe + verified | |
| `/home/ywan0794/EvalMDE/baselines/`: | |
| - `depth_pro.py` → emits `depth_metric` (+ `intrinsics` from FOV head) | |
| - `marigold.py` → emits `depth_affine_invariant` (paper: scale_inv+shift_inv → affine) | |
| - `lotus.py` → emits `disparity_affine_invariant` when `--disparity` set | |
| - `depthmaster.py` → emits `depth_affine_invariant` | |
| - `ppd.py` → emits `depth_affine_invariant` (training quantile normalization) | |
| - `da3_mono.py` → emits `depth_scale_invariant` | |
| - `fe2e.py` → emits `depth_affine_invariant` (Lpred clamped to [0,1]) | |
| `MGEBaselineInterface` copied to `/home/ywan0794/EvalMDE/test/baseline.py`. | |
| ### 3. EvalMDE-native dataloader skeleton — written | |
| `/home/ywan0794/EvalMDE/scripts/dataloader.py` (`EvalMDELoaderPipeline`): | |
| - Reads `<scene>/rgb.png` + `<scene>/gt_depth.npz` (keys: `depth (H,W)`, `intr (4,) [fx,fy,cx,cy]px`, `valid (H,W) bool`) | |
| - Pixel intrinsics → 3×3 normalized matrix `[fx/W, fy/H, cx/W, cy/H]` (MoGe convention) | |
| - Computes 3D pointmap from depth + native pixel intrinsics | |
| - NaN/invalid pixels replaced with `1.0` (matches `evalmde/utils/depth.py:load_data` convention) | |
| - Returns dict with: `image [3,H,W] float [0,1]`, `depth`, `depth_mask`, `intrinsics (3,3)`, | |
| `points (H,W,3)`, `is_metric=True`, `_intr_px (4,)` (for EvalMDE metrics raw npz) | |
| ### 4. Infinigen download — IN PROGRESS (background) | |
| - Source: Princeton GDrive `1amzb6KyF2USFQ5W4CeYKFCh1F-yOQsmp` | |
| - Target: `/home/ywan0794/EvalMDE/data/infinigen/` | |
| - Log: `/tmp/dl_infinigen.log` | |
| - Estimated 50-100 GB | |
| - Check state: `du -sh /home/ywan0794/EvalMDE/data/infinigen/` | |
| ### 5. Production MoGe-protocol eval — independent track, already running | |
| - `sbatch eval_scripts/eval_all_slurm.sh` submitted earlier (job 12110 etc.) | |
| - 5 models pending (Marigold/Lotus/DepthMaster/PPD/FE2E), 2 already done (DA3-Mono/Depth Pro) | |
| - Results in `/home/ywan0794/MoGe/eval_output/<model>_<TS>.json` | |
| - **EvalMDE adaptation is a separate effort, doesn't block production MoGe eval.** | |
| --- | |
| ## TODO (was 4 items, now 2 remain) | |
| ### ✅ TODO-1: Fix baseline imports — SUPERSEDED by sys.path approach in run_inference.py | |
| `EvalMDE/baselines/*.py` still have `from moge.test.baseline import MGEBaselineInterface`. | |
| **Resolved via Option A**: `scripts/run_inference.py` does `sys.path.insert(0, '/home/ywan0794/MoGe')` | |
| so baselines still resolve their interface from MoGe. No sed needed. | |
| ### ✅ TODO-2 (inference driver): `scripts/run_inference.py` — WRITTEN | |
| `/home/ywan0794/EvalMDE/scripts/run_inference.py`: | |
| - Click CLI with `--baseline /path/to/baselines/<m>.py --data-root <infinigen> --output-root <out> --model-name <name>` | |
| - Passes remaining click args through to baseline's `load.main(ctx.args)` | |
| - For each scene with `rgb.png + gt_depth.npz`: loads rgb, builds normalized 3×3 K from GT pixel intr, | |
| calls `baseline.infer_for_evaluation(image, K_norm)`, picks depth in priority order | |
| (`depth_metric > depth_scale_invariant > depth_affine_invariant > 1/disparity_affine_invariant`), | |
| writes `<out>/<model>/<scene>/pred_depth.npz` with EvalMDE keys `{depth, intr (4,) px, valid}` | |
| - For pred intrinsics: uses model-predicted intr if present (Depth Pro), else GT intr | |
| ### ❗ Original TODO-2 (script/eval.py) was REWORKED into 2 stages: inference + metric. | |
| This is cleaner: inference runs in per-model env, metric runs in evalmde env. | |
| ### TODO-3: Write `scripts/compute_metrics.py` (run in evalmde env) | |
| Reads each model's pred_depth.npz + GT gt_depth.npz, computes EvalMDE metrics + standard MDE metrics. | |
| Pseudocode: | |
| ```python | |
| import sys, json, click | |
| from pathlib import Path | |
| import numpy as np | |
| from evalmde.utils.depth import load_data | |
| from evalmde.metrics.rel_normal import compute_rel_normal | |
| from evalmde.metrics.sawa_h import compute_sawa_h | |
| @click.command() | |
| @click.option('--gt-root', required=True, type=click.Path()) # Infinigen root | |
| @click.option('--pred-root', required=True, type=click.Path()) # output of run_inference.py | |
| @click.option('--model-name', required=True, type=str) | |
| @click.option('--output', required=True, type=click.Path()) | |
| def main(gt_root, pred_root, model_name, output): | |
| gt_root = Path(gt_root); pred_root = Path(pred_root) / model_name | |
| scenes = sorted(d.name for d in pred_root.iterdir() if (d / 'pred_depth.npz').exists()) | |
| results = [] | |
| for scene in scenes: | |
| gt_d, gt_intr, gt_v = load_data(gt_root / scene / 'gt_depth.npz') | |
| pr_d, pr_intr, pr_v = load_data(pred_root / scene / 'pred_depth.npz') | |
| # SAWA-H aligns internally (affine via least-squares). RelNormal uses surface normals | |
| # which are invariant to scale but NOT to shift — for affine-invariant preds, the | |
| # shift will skew normals at far depths. Acceptable caveat in Phase 0; document it. | |
| sawa = compute_sawa_h (pr_d, pr_intr, pr_v, gt_d, gt_intr, gt_v) | |
| rnorm = compute_rel_normal(pr_d, pr_intr, pr_v, gt_d, gt_intr, gt_v) | |
| # Standard AbsRel + δ1 after affine alignment (re-implement, ~10 lines): | |
| mask = gt_v & pr_v | |
| gtm, prm = gt_d[mask], pr_d[mask] | |
| # fit y = a*x + b on (prm, gtm) | |
| A = np.stack([prm, np.ones_like(prm)], axis=-1) | |
| a, b = np.linalg.lstsq(A, gtm, rcond=None)[0] | |
| aligned = pr_d * a + b | |
| am = aligned[mask] | |
| abs_rel = np.mean(np.abs(am - gtm) / np.maximum(gtm, 1e-6)) | |
| delta1 = np.mean(np.maximum(am/gtm, gtm/am) < 1.25) | |
| results.append({'scene': scene, 'sawa_h': float(sawa), 'rel_normal': float(rnorm), | |
| 'abs_rel': float(abs_rel), 'delta1': float(delta1)}) | |
| # Per-scene + aggregate mean | |
| summary = {'per_scene': results, | |
| 'mean': {k: float(np.mean([r[k] for r in results])) for k in ['sawa_h','rel_normal','abs_rel','delta1']}} | |
| json.dump(summary, open(output, 'w'), indent=2) | |
| if __name__ == '__main__': | |
| main() | |
| ``` | |
| **Note on alignment**: `compute_sawa_h` aligns internally (via `align_depth_least_square` + `align_affine_lstsq`), | |
| so passing RAW pred (affine-invariant) is correct. `compute_rel_normal` does NOT align — its | |
| inputs should be in a comparable depth scale. For Phase 0 simplicity, pass raw pred; document | |
| the affine-shift caveat in the analysis. For stricter eval, pre-affine-align before RelNormal. | |
| ### TODO-4: Scene list / config | |
| Once Infinigen download succeeds (currently blocked, see issue below), `run_inference.py` | |
| auto-discovers all scene dirs under `--data-root`. If a subset is wanted, write | |
| `scenes.txt` and add filtering in run_inference.py (~3 lines). | |
| ### TODO-5: sbatch `eval_scripts/eval_evalmde_all_slurm.sh` | |
| Same pattern as MoGe's `sanity_all_slurm.sh`: single sbatch, single H100, serial per-model. | |
| For each of 7 models: `conda activate <env>; python scripts/run_inference.py --baseline baselines/<m>.py ...` | |
| Then after all 7 inferences done: `conda activate evalmde; for m in ...; do python scripts/compute_metrics.py --model-name $m ...; done` | |
| Each per-model env needs `evalmde` pip-installed so it can `from evalmde.metrics...` — actually | |
| **no, this is wrong**: per-model envs only run inference (which needs torch + model wrapper deps, | |
| no evalmde). Only the metric-aggregation stage runs in evalmde env. So envs need no extra install. | |
| ### TODO-3: Scene list / config | |
| Once Infinigen download finishes, inspect actual layout: | |
| ```bash | |
| ls /home/ywan0794/EvalMDE/data/infinigen/ | head -20 | |
| ``` | |
| If scenes are `scene_001/`, `scene_002/`, ...: dataloader auto-discovers them. | |
| If grouped under sub-folders or different naming: may need a manual `scenes.txt` split file. | |
| ### TODO-4: sbatch `EvalMDE/eval_scripts/eval_evalmde_all_slurm.sh` | |
| Mirror MoGe's `sanity_all_slurm.sh` structure: | |
| - Single sbatch, single H100, serial per-model | |
| - For each model: activate model's conda env, run `python scripts/eval.py --baseline baselines/<m>.py --data-root data/infinigen --output results/<m>.json` | |
| - After all inference done, optionally re-aggregate in evalmde env for cross-model summary | |
| Per-model env mapping same as MoGe: | |
| | model | env | | |
| |---|---| | |
| | depth_pro | depth-pro | | |
| | marigold | marigold | | |
| | lotus | lotus | | |
| | depthmaster | depthmaster | | |
| | ppd | ppd | | |
| | da3_mono | da3 | | |
| | fe2e | fe2e | | |
| Plus: each env needs `evalmde` package installed (`pip install -e /home/ywan0794/EvalMDE`) | |
| so `from evalmde.metrics.* import compute_rel_normal, compute_sawa_h` works inside model envs. | |
| --- | |
| ## Paper-canonical inference parameters (locked, confirmed against each repo) | |
| | Model | Args | Source | | |
| |---|---|---| | |
| | Depth Pro | `--precision fp32` | `create_model_and_transforms()` default | | |
| | Marigold | v1-1 + `--denoise_steps 4 --ensemble_size 1` | (user decision: balanced speed) | | |
| | Lotus | g-v2-1-disparity + `--mode generation --disparity --timestep 999 --fp16 --seed 42` | `Lotus/eval.sh` | | |
| | DepthMaster | `--processing_res 768` | `DepthMaster/scripts/infer.sh` | | |
| | PPD | `--semantics_model MoGe2 --semantics_pth checkpoints/moge2.pt --model_pth checkpoints/ppd_moge.pth --sampling_steps 4` | `PPD/ppd/configs/eval.yaml` | | |
| | DA3-Mono | `--hf_id depth-anything/DA3MONO-LARGE` | DA3 README | | |
| | FE2E | `--prompt_type empty --single_denoise --cfg_guidance 6.0 --size_level 768` | `FE2E/README.md` eval block | | |
| --- | |
| ## Key insights to preserve | |
| 1. **EvalMDE protocol uses raw native input, no homography warp.** MoGe's eval pipeline | |
| does aggressive canonical-view warping (`dataloader.py:_process_instance:119-180`). | |
| That is MoGe-paper-specific; EvalMDE explicitly uses raw inputs (see `compute_metrics_example.py`). | |
| 2. **Output key contract** (per MGEBaselineInterface): | |
| - `depth_metric` → metric depth in meters (Depth Pro) | |
| - `depth_scale_invariant` → scale-invariant relative depth (DA3-Mono) | |
| - `depth_affine_invariant` → affine-invariant depth (Marigold/DepthMaster/PPD/FE2E) | |
| - `disparity_affine_invariant` → affine-invariant disparity (Lotus disparity ckpts) | |
| 3. **Pre-alignment for SAWA-H/RelNormal**: SAWA-H itself does affine alignment internally | |
| (`evalmde/metrics/sawa_h.py:compute_sawa_h` uses `align_depth_least_square` + `align_affine_lstsq`), | |
| so you can pass RAW pred depth to SAWA-H. RelNormal works on normals which are | |
| scale-invariant in the limit, but **shift in depth space WILL skew normals at far depths** — | |
| so for affine-invariant pred models, do an affine align before passing to `compute_rel_normal`. | |
| 4. **MoGe's eval can run in parallel with EvalMDE work.** Production `eval_all_slurm.sh` | |
| already running. Don't disturb. | |
| 5. **Lotus disparity ckpt inversion was numerically unstable** (1/disp blows up near | |
| disparity=0). For EvalMDE, only emit `disparity_affine_invariant` from Lotus, then | |
| convert: `aligned_disp = scale*disp + shift` (fit in disp space), `aligned_depth = 1/aligned_disp.clamp(1/gt_depth_max)`. | |
| Reference: `moge/test/metrics.py:202-218` disparity_affine_invariant block. | |
| --- | |
| ## Resume instructions | |
| 1. `cd /home/ywan0794/EvalMDE` | |
| 2. Check Infinigen download: `du -sh data/infinigen; tail /tmp/dl_infinigen.log` | |
| 3. Fix imports (TODO-1): | |
| ```bash | |
| sed -i 's|from moge.test.baseline|from test.baseline|g' baselines/*.py | |
| ``` | |
| 4. Write `scripts/eval.py` (TODO-2) using the pseudocode above. | |
| 5. Test on 1 scene with depth_pro: `python scripts/eval.py --baseline baselines/depth_pro.py --data-root data/infinigen --output /tmp/test.json --repo /home/ywan0794/EvalMDE/ml-depth-pro --checkpoint /home/ywan0794/EvalMDE/ml-depth-pro/checkpoints/depth_pro.pt` | |
| 6. Inspect `/tmp/test.json`. If sane (rel_normal in [0, 1] rad, sawa_h plausible), | |
| proceed to write sbatch (TODO-4). | |
| --- | |
| **End of handoff.** | |