nielsr HF Staff commited on
Commit
3239e74
·
verified ·
1 Parent(s): b1a2678

Improve model card for $\\pi^3$: Scalable Permutation-Equivariant Visual Geometry Learning

Browse files

This PR significantly enhances the model card for the $\\pi^3$ model by:

* Adding `license: bsd-2-clause` based on the project's GitHub repository.
* Including the `pipeline_tag: image-to-3d` for better discoverability on the Hugging Face Hub.
* Specifying `library_name: pytorch`, addressing the "[More Information Needed]" note in the existing card and correctly identifying the model's primary framework.
* Providing a comprehensive overview of the model from the paper's abstract and GitHub README.
* Adding direct links to the paper, project page, GitHub repository, and a Hugging Face demo.
* Including detailed usage instructions with code snippets for quick inference.
* Adding sections for acknowledgements, citation, and license information.

These updates make the model card more informative, accessible, and aligned with Hugging Face Hub best practices.

Files changed (1) hide show
  1. README.md +156 -3
README.md CHANGED
@@ -1,9 +1,162 @@
1
  ---
 
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: bsd-2-clause
3
+ pipeline_tag: image-to-3d
4
+ library_name: pytorch
5
  tags:
6
  - model_hub_mixin
7
  - pytorch_model_hub_mixin
8
  ---
9
 
10
+ # 🌌 $\\pi^3$: Scalable Permutation-Equivariant Visual Geometry Learning
11
+
12
+ <div align="center">
13
+ <p>
14
+ <a href="https://huggingface.co/papers/2507.13347" target="_blank">
15
+ <img src="https://img.shields.io/badge/Paper-00AEEF?style=plastic&logo=arxiv&logoColor=white" alt="Paper">
16
+ </a>
17
+ <a href="https://yyfz.github.io/pi3/" target="_blank">
18
+ <img src="https://img.shields.io/badge/Project Page-F78100?style=plastic&logo=google-chrome&logoColor=white" alt="Project Page">
19
+ </a>
20
+ <a href="https://github.com/yyfz/Pi3" target="_blank">
21
+ <img src="https://img.shields.io/badge/GitHub-Code-blue?logo=github" alt="GitHub">
22
+ </a>
23
+ <a href="https://huggingface.co/spaces/yyfz233/Pi3" target="_blank">
24
+ <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue" alt="Hugging Face Demo">
25
+ </a>
26
+ </p>
27
+ </div>
28
+
29
+ <div align="center">
30
+ <a href="https://yyfz.github.io/pi3/">
31
+ <img src="https://huggingface.co/yyfz233/Pi3/resolve/main/assets/main.png" width="90%">
32
+ </a>
33
+ <p>
34
+ <i>$\\pi^3$ reconstructs visual geometry without a fixed reference view, achieving robust, state-of-the-art performance.</i>
35
+ </p>
36
+ </div>
37
+
38
+ ## ✨ Overview
39
+
40
+ We introduce $\\pi^3$ (Pi-Cubed), a novel feed-forward neural network that revolutionizes visual geometry reconstruction by **eliminating the need for a fixed reference view**. Traditional methods, which rely on a designated reference frame, are often prone to instability and failure if the reference is suboptimal.
41
+
42
+ In contrast, $\\pi^3$ employs a fully **permutation-equivariant** architecture. This allows it to directly predict affine-invariant camera poses and scale-invariant local point maps from an unordered set of images, breaking free from the constraints of a reference frame. This design makes our model inherently **robust to input ordering** and **highly scalable**.
43
+
44
+ A key emergent property of our simple, bias-free design is the learning of a dense and structured latent representation of the camera pose manifold. Without complex priors or training schemes, $\\pi^3$ achieves **state-of-the-art performance** 🏆 on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map estimation.
45
+
46
+ ## 🚀 Quick Start
47
+
48
+ ### 1. Clone & Install Dependencies
49
+ First, clone the repository and install the required packages.
50
+ ```bash
51
+ git clone https://github.com/yyfz/Pi3.git
52
+ cd Pi3
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ ### 2. Run Inference from Command Line
57
+
58
+ Try our example inference script. You can run it on a directory of images or a video file.
59
+
60
+ If the automatic download from Hugging Face is slow, you can download the model checkpoint manually from [here](https://huggingface.co/yyfz233/Pi3/resolve/main/model.safetensors) and specify its local path using the `--ckpt` argument.
61
+
62
+ ```bash
63
+ # Run with default example video
64
+ python example.py
65
+
66
+ # Run on your own data (image folder or .mp4 file)
67
+ python example.py --data_path <path/to/your/images_dir_or_video.mp4>
68
+ ```
69
+
70
+ **Optional Arguments:**
71
+
72
+ * `--data_path`: Path to the input image directory or a video file. (Default: `examples/skating.mp4`)
73
+ * `--save_path`: Path to save the output `.ply` point cloud. (Default: `examples/result.ply`)
74
+ * `--interval`: Frame sampling interval. (Default: `1` for images, `10` for video)
75
+ * `--ckpt`: Path to a custom model checkpoint file.
76
+ * `--device`: Device to run inference on. (Default: `cuda`)
77
+
78
+ ### 3. Run with Gradio Demo
79
+
80
+ You can also launch a local Gradio demo for an interactive experience.
81
+
82
+ ```bash
83
+ # Install demo-specific requirements
84
+ pip install -r requirements_demo.txt
85
+
86
+ # Launch the demo
87
+ python demo_gradio.py
88
+ ```
89
+
90
+ ## 🛠️ Detailed Usage
91
+
92
+ ### Model Input & Output
93
+
94
+ The model takes a tensor of images and outputs a dictionary containing the reconstructed geometry.
95
+
96
+ * **Input**: A `torch.Tensor` of shape $B \times N \times 3 \times H \times W$ with pixel values in the range `[0, 1]`.
97
+ * **Output**: A `dict` with the following keys:
98
+ * `points`: Global point cloud unprojected by `local points` and `camera_poses` (`torch.Tensor`, $B \times N \times H \times W \times 3$).
99
+ * `local_points`: Per-view local point maps (`torch.Tensor`, $B \times N \times H \times W \times 3$).
100
+ * `conf`: Confidence scores for local points (values in `[0, 1]`, higher is better) (`torch.Tensor`, $B \times N \times H \times W \times 1$).
101
+ * `camera_poses`: Camera-to-world transformation matrices (`4x4` in OpenCV format) (`torch.Tensor`, $B \times N \times 4 \times 4$).
102
+
103
+ ### Example Code Snippet
104
+
105
+ Here is a minimal example of how to run the model on a batch of images.
106
+
107
+ ```python
108
+ import torch
109
+ from pi3.models.pi3 import Pi3
110
+ from pi3.utils.basic import load_images_as_tensor # Assuming you have a helper function
111
+
112
+ # --- Setup ---
113
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
114
+ model = Pi3.from_pretrained("yyfz233/Pi3").to(device).eval()
115
+ # or download checkpoints from `https://huggingface.co/yyfz233/Pi3/resolve/main/model.safetensors`
116
+
117
+ # --- Load Data ---
118
+ # Load a sequence of N images into a tensor
119
+ # imgs shape: (N, 3, H, W).
120
+ # imgs value: [0, 1]
121
+ imgs = load_images_as_tensor('examples/skating.mp4', interval=10).to(device)
122
+
123
+ # --- Inference ---
124
+ print("Running model inference...")
125
+ # Use mixed precision for better performance on compatible GPUs
126
+ dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16
127
+
128
+ with torch.no_grad():
129
+ with torch.amp.autocast('cuda', dtype=dtype):
130
+ # Add a batch dimension -> (1, N, 3, H, W)
131
+ results = model(imgs[None])
132
+
133
+ print("Reconstruction complete!")
134
+ # Access outputs: results['points'], results['camera_poses'] and results['local_points'].
135
+ ```
136
+
137
+ ## 🙏 Acknowledgements
138
+
139
+ Our work builds upon several fantastic open-source projects. We'd like to express our gratitude to the authors of:
140
+
141
+ * [DUSt3R](https://github.com/naver/dust3r)
142
+ * [CUT3R](https://github.com/CUT3R/CUT3R)
143
+ * [VGGT](https://github.com/facebookresearch/vggt)
144
+
145
+ ## 📜 Citation
146
+
147
+ If you find our work useful, please consider citing:
148
+
149
+ ```bibtex
150
+ @misc{wang2025pi3,
151
+ title={$\\pi^3$: Scalable Permutation-Equivariant Visual Geometry Learning},
152
+ author={Yifan Wang and Jianjun Zhou and Haoyi Zhu and Wenzheng Chang and Yang Zhou and Zizun Li and Junyi Chen and Jiangmiao Pang and Chunhua Shen and Tong He},
153
+ year={2025},
154
+ eprint={2507.13347},
155
+ archivePrefix={arXiv},
156
+ primaryClass={cs.CV},
157
+ url={https://arxiv.org/abs/2507.13347},
158
+ }
159
+ ```
160
+
161
+ ## 📄 License
162
+ For academic use, this project is licensed under the 2-clause BSD License. See the [LICENSE](https://github.com/yyfz/Pi3/blob/main/LICENSE) file for details. For commercial use, please contact the authors.