Add pipeline tag and paper link

Hi! I'm Niels from the Hugging Face community science team. I noticed this repository was missing a `pipeline_tag` and a direct link to its corresponding paper. This PR adds the `image-to-3d` tag to improve visibility on the Hub and links the Markdown content to the research paper page. I've also included the citation information from the official repository.

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 license: apache-2.0
 tags:
 - novel-view-synthesis
 - multi-view-diffusion
@@ -11,10 +12,14 @@ tags:
 **Repurposing Geometric Foundation Models for Multi-view Diffusion**
-[[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
 ## Quick Start
 ```bash
 git clone https://github.com/cvlab-kaist/GLD.git
 cd GLD
@@ -43,3 +48,18 @@ python -c "from huggingface_hub import snapshot_download; snapshot_download('Seo
 Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
 MAE decoder checkpoints contain **decoder weights only** (encoder removed).

 ---
 license: apache-2.0
+pipeline_tag: image-to-3d
 tags:
 - novel-view-synthesis
 - multi-view-diffusion
 **Repurposing Geometric Foundation Models for Multi-view Diffusion**
+[[Paper]](https://huggingface.co/papers/2603.22275) | [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
+Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.
 ## Quick Start
+To use these models, follow the setup instructions in the [official GitHub repository](https://github.com/cvlab-kaist/GLD).
 ```bash
 git clone https://github.com/cvlab-kaist/GLD.git
 cd GLD
 Stage-2 and MAE decoder checkpoints contain **EMA weights only**.
 MAE decoder checkpoints contain **decoder weights only** (encoder removed).
+## Citation
+```bibtex
+@article{jang2026gld,
+  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
+  author={Jang, Wooseok and Jeon, Seonghu  and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
+  journal={arXiv preprint arXiv:2603.22275},
+  year={2026}
+}
+```
+## Acknowledgements
+Built upon [RAE](https://github.com/nicknign/RAE_release), [Depth Anything 3](https://github.com/DepthAnything/Depth-Anything-3), [VGGT](https://github.com/facebookresearch/vggt), [CUT3R](https://github.com/naver/CUT3R), and [SiT](https://github.com/willisma/SiT).