nielsr HF Staff commited on
Commit
1e72100
·
verified ·
1 Parent(s): eb3d89a

Add pipeline tag and paper link

Browse files

Hi! I'm Niels from the Hugging Face community science team. I noticed this repository was missing a `pipeline_tag` and a direct link to its corresponding paper. This PR adds the `image-to-3d` tag to improve visibility on the Hub and links the Markdown content to the research paper page. I've also included the citation information from the official repository.

Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  license: apache-2.0
 
3
  tags:
4
  - novel-view-synthesis
5
  - multi-view-diffusion
@@ -11,10 +12,14 @@ tags:
11
 
12
  **Repurposing Geometric Foundation Models for Multi-view Diffusion**
13
 
14
- [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
 
 
15
 
16
  ## Quick Start
17
 
 
 
18
  ```bash
19
  git clone https://github.com/cvlab-kaist/GLD.git
20
  cd GLD
@@ -43,3 +48,18 @@ python -c "from huggingface_hub import snapshot_download; snapshot_download('Seo
43
 
44
  Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
45
  MAE decoder checkpoints contain **decoder weights only** (encoder removed).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-to-3d
4
  tags:
5
  - novel-view-synthesis
6
  - multi-view-diffusion
 
12
 
13
  **Repurposing Geometric Foundation Models for Multi-view Diffusion**
14
 
15
+ [[Paper]](https://huggingface.co/papers/2603.22275) | [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
16
+
17
+ Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.
18
 
19
  ## Quick Start
20
 
21
+ To use these models, follow the setup instructions in the [official GitHub repository](https://github.com/cvlab-kaist/GLD).
22
+
23
  ```bash
24
  git clone https://github.com/cvlab-kaist/GLD.git
25
  cd GLD
 
48
 
49
  Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
50
  MAE decoder checkpoints contain **decoder weights only** (encoder removed).
51
+
52
+ ## Citation
53
+
54
+ ```bibtex
55
+ @article{jang2026gld,
56
+ title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
57
+ author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
58
+ journal={arXiv preprint arXiv:2603.22275},
59
+ year={2026}
60
+ }
61
+ ```
62
+
63
+ ## Acknowledgements
64
+
65
+ Built upon [RAE](https://github.com/nicknign/RAE_release), [Depth Anything 3](https://github.com/DepthAnything/Depth-Anything-3), [VGGT](https://github.com/facebookresearch/vggt), [CUT3R](https://github.com/naver/CUT3R), and [SiT](https://github.com/willisma/SiT).