zbbhhh commited on
Commit
f4dea28
·
verified ·
1 Parent(s): 65af6cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -47,6 +47,7 @@
47
 
48
  </div>
49
 
 
50
  ## 📢 News
51
 
52
  * **[2026-05-15]** 🚀 Code and checkpoint are open-sourced.
@@ -181,6 +182,58 @@ def forward(self, aggregated_tokens_list, images, patch_start_idx, query_points=
181
  return coord_preds, vis_scores, conf_scores
182
  ```
183
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  ---
185
 
186
  ### Dataset Preparation
@@ -420,3 +473,4 @@ This project is built upon several excellent open-source projects:
420
  * [SAM2](https://github.com/facebookresearch/sam2)
421
 
422
  We thank the authors for releasing their code and models to the community.
 
 
47
 
48
  </div>
49
 
50
+
51
  ## 📢 News
52
 
53
  * **[2026-05-15]** 🚀 Code and checkpoint are open-sourced.
 
182
  return coord_preds, vis_scores, conf_scores
183
  ```
184
 
185
+ #### Optional
186
+
187
+ We observe that VGGT recomputes the 2D positional embeddings during every forward pass, even when the input image resolution remains unchanged. To accelerate both training and inference, we adapt the implementation by precomputing the 2D positional embeddings offline and loading them directly at runtime, thereby avoiding redundant computations.
188
+
189
+ 1. Generate the 2D positional embeddings using `gen_pos_embed.py`:
190
+
191
+ ```bash
192
+ python gen_pos_embed.py --img_H 518 --img_W 518
193
+ ```
194
+
195
+ We also provide precomputed positional embeddings for various commonly used image resolutions on [Hugging Face](https://huggingface.co/zbbhhh/VGGT-S).
196
+
197
+ 2. Modify the VGGT tracking head to use the precomputed embeddings.
198
+
199
+ Open:
200
+
201
+ ```text
202
+ third_party/vggt_main/vggt/heads/track_modules/base_track_predictor.py
203
+ ```
204
+
205
+ Add the following line at the end of the `__init__` method in the `BaseTrackerPredictor` class, around [base_track_predictor.py#L81](https://github.com/facebookresearch/vggt/blob/main/vggt/heads/track_modules/base_track_predictor.py#L81):
206
+
207
+ ```python
208
+ self.pos_embed = None
209
+ ```
210
+
211
+ This attribute is used to cache the loaded positional embeddings.
212
+
213
+ Then, replace the following line in [base_track_predictor.py#L149](https://github.com/facebookresearch/vggt/blob/main/vggt/heads/track_modules/base_track_predictor.py#L149):
214
+
215
+ ```python
216
+ pos_embed = get_2d_sincos_pos_embed(self.transformer_dim, grid_size=(HH, WW)).to(query_points.device)
217
+ ```
218
+
219
+ with:
220
+
221
+ ```python
222
+ pos_emb_pt = (
223
+ f"vggt_main/pos_embed/"
224
+ f"pos_embed_{self.transformer_dim}_{HH}_{WW}.pt"
225
+ )
226
+
227
+ assert os.path.exists(pos_emb_pt), (
228
+ f"[BUG] Positional embedding file not found: {pos_emb_pt}"
229
+ )
230
+
231
+ if self.pos_embed is None:
232
+ self.pos_embed = torch.load(pos_emb_pt).to(query_points.device)
233
+
234
+ pos_embed = self.pos_embed
235
+ ```
236
+
237
  ---
238
 
239
  ### Dataset Preparation
 
473
  * [SAM2](https://github.com/facebookresearch/sam2)
474
 
475
  We thank the authors for releasing their code and models to the community.
476
+