Update README.md
Browse files
README.md
CHANGED
|
@@ -47,6 +47,7 @@
|
|
| 47 |
|
| 48 |
</div>
|
| 49 |
|
|
|
|
| 50 |
## 📢 News
|
| 51 |
|
| 52 |
* **[2026-05-15]** 🚀 Code and checkpoint are open-sourced.
|
|
@@ -181,6 +182,58 @@ def forward(self, aggregated_tokens_list, images, patch_start_idx, query_points=
|
|
| 181 |
return coord_preds, vis_scores, conf_scores
|
| 182 |
```
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
---
|
| 185 |
|
| 186 |
### Dataset Preparation
|
|
@@ -420,3 +473,4 @@ This project is built upon several excellent open-source projects:
|
|
| 420 |
* [SAM2](https://github.com/facebookresearch/sam2)
|
| 421 |
|
| 422 |
We thank the authors for releasing their code and models to the community.
|
|
|
|
|
|
| 47 |
|
| 48 |
</div>
|
| 49 |
|
| 50 |
+
|
| 51 |
## 📢 News
|
| 52 |
|
| 53 |
* **[2026-05-15]** 🚀 Code and checkpoint are open-sourced.
|
|
|
|
| 182 |
return coord_preds, vis_scores, conf_scores
|
| 183 |
```
|
| 184 |
|
| 185 |
+
#### Optional
|
| 186 |
+
|
| 187 |
+
We observe that VGGT recomputes the 2D positional embeddings during every forward pass, even when the input image resolution remains unchanged. To accelerate both training and inference, we adapt the implementation by precomputing the 2D positional embeddings offline and loading them directly at runtime, thereby avoiding redundant computations.
|
| 188 |
+
|
| 189 |
+
1. Generate the 2D positional embeddings using `gen_pos_embed.py`:
|
| 190 |
+
|
| 191 |
+
```bash
|
| 192 |
+
python gen_pos_embed.py --img_H 518 --img_W 518
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
We also provide precomputed positional embeddings for various commonly used image resolutions on [Hugging Face](https://huggingface.co/zbbhhh/VGGT-S).
|
| 196 |
+
|
| 197 |
+
2. Modify the VGGT tracking head to use the precomputed embeddings.
|
| 198 |
+
|
| 199 |
+
Open:
|
| 200 |
+
|
| 201 |
+
```text
|
| 202 |
+
third_party/vggt_main/vggt/heads/track_modules/base_track_predictor.py
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
Add the following line at the end of the `__init__` method in the `BaseTrackerPredictor` class, around [base_track_predictor.py#L81](https://github.com/facebookresearch/vggt/blob/main/vggt/heads/track_modules/base_track_predictor.py#L81):
|
| 206 |
+
|
| 207 |
+
```python
|
| 208 |
+
self.pos_embed = None
|
| 209 |
+
```
|
| 210 |
+
|
| 211 |
+
This attribute is used to cache the loaded positional embeddings.
|
| 212 |
+
|
| 213 |
+
Then, replace the following line in [base_track_predictor.py#L149](https://github.com/facebookresearch/vggt/blob/main/vggt/heads/track_modules/base_track_predictor.py#L149):
|
| 214 |
+
|
| 215 |
+
```python
|
| 216 |
+
pos_embed = get_2d_sincos_pos_embed(self.transformer_dim, grid_size=(HH, WW)).to(query_points.device)
|
| 217 |
+
```
|
| 218 |
+
|
| 219 |
+
with:
|
| 220 |
+
|
| 221 |
+
```python
|
| 222 |
+
pos_emb_pt = (
|
| 223 |
+
f"vggt_main/pos_embed/"
|
| 224 |
+
f"pos_embed_{self.transformer_dim}_{HH}_{WW}.pt"
|
| 225 |
+
)
|
| 226 |
+
|
| 227 |
+
assert os.path.exists(pos_emb_pt), (
|
| 228 |
+
f"[BUG] Positional embedding file not found: {pos_emb_pt}"
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
if self.pos_embed is None:
|
| 232 |
+
self.pos_embed = torch.load(pos_emb_pt).to(query_points.device)
|
| 233 |
+
|
| 234 |
+
pos_embed = self.pos_embed
|
| 235 |
+
```
|
| 236 |
+
|
| 237 |
---
|
| 238 |
|
| 239 |
### Dataset Preparation
|
|
|
|
| 473 |
* [SAM2](https://github.com/facebookresearch/sam2)
|
| 474 |
|
| 475 |
We thank the authors for releasing their code and models to the community.
|
| 476 |
+
|