Add pipeline tag and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +43 -2
README.md CHANGED
@@ -1,3 +1,7 @@
 
 
 
 
1
  <div align="center">
2
  <img src="https://cdn-uploads.huggingface.co/production/uploads/69672d93bece445e6907b7a2/Ju4n-ceuPYTYlo__v9b7Q.png" width="50%">
3
  </div>
@@ -19,8 +23,8 @@
19
  <br>
20
  <p align="center"> <a href='https://yutian10.github.io/AnyRecon/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
21
  <a href="https://arxiv.org/pdf/2604.19747"><img src="https://img.shields.io/static/v1?label=Arxiv&message=AnyRecon&color=red&logo=arxiv"></a> &nbsp;
 
22
  <a href='https://huggingface.co/Yutian10/AnyRecon/tree/main'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow'></a> &nbsp;
23
- <a href=''><img src='https://img.shields.io/badge/YouTube-Video-FF0000?logo=youtube&logoColor=white'></a> &nbsp;
24
  </p>
25
  <p align="center">
26
  <video
@@ -34,4 +38,41 @@
34
  </p>
35
 
36
  ## 🌟 Abstract
37
- Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and reconstruction is crucial for large-scale 3D scenes. Thus, we introduce a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval. To ensure efficiency, we combine 4-step diffusion distillation with context-window sparse attention to reduce quadratic complexity. Extensive experiments demonstrate robust and scalable reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-to-3d
3
+ ---
4
+
5
  <div align="center">
6
  <img src="https://cdn-uploads.huggingface.co/production/uploads/69672d93bece445e6907b7a2/Ju4n-ceuPYTYlo__v9b7Q.png" width="50%">
7
  </div>
 
23
  <br>
24
  <p align="center"> <a href='https://yutian10.github.io/AnyRecon/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
25
  <a href="https://arxiv.org/pdf/2604.19747"><img src="https://img.shields.io/static/v1?label=Arxiv&message=AnyRecon&color=red&logo=arxiv"></a> &nbsp;
26
+ <a href='https://github.com/OpenImagingLab/AnyRecon'><img src='https://img.shields.io/badge/Github-Code-blue?logo=github'></a> &nbsp;
27
  <a href='https://huggingface.co/Yutian10/AnyRecon/tree/main'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow'></a> &nbsp;
 
28
  </p>
29
  <p align="center">
30
  <video
 
38
  </p>
39
 
40
  ## 🌟 Abstract
41
+ Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes.
42
+
43
+ ## πŸ› οΈ Environment Setup
44
+
45
+ ```bash
46
+ git clone https://github.com/OpenImagingLab/AnyRecon.git
47
+ cd AnyRecon
48
+ conda create -n anyrecon python=3.10 -y
49
+ conda activate anyrecon
50
+ pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ## πŸš€ Quick Start
55
+
56
+ ### Inference
57
+ You can run the inference using the provided python script (ensure you have downloaded the required weights and placed them in the `./checkpoints` folder):
58
+
59
+ ```bash
60
+ python run_AnyRecon.py \
61
+ --root_dir example/valley \
62
+ --output_dir example/valley \
63
+ --lora_path full_attention.ckpt
64
+ ```
65
+
66
+ ## πŸ”— Citation
67
+ If you find our work helpful, please cite it:
68
+ ```bibtex
69
+ @article{chen2026anyrecon,
70
+ title={AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model},
71
+ author={Chen, Yutian and Guo, Shi and Jin, Renbiao and Yang, Tianshuo and Cai, Xin and Luo, Yawen and Yang, Mingxin and Yu, Mulin and Xu, Linning and Xue, Tianfan},
72
+ journal={arXiv preprint arXiv:2604.19747},
73
+ year={2026}
74
+ }
75
+ ```
76
+
77
+ ## πŸ’— Acknowledgments
78
+ Thanks to these great repositories: [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio).