nielsr HF Staff commited on
Commit
2d2e0c8
·
verified ·
1 Parent(s): b7d1deb

Improve model card and add robotics metadata

Browse files

Hi! I'm Niels, part of the community science team at Hugging Face. I've opened this PR to improve the model card for FrameSkip.

The improvements include:
- Adding the `pipeline_tag: robotics` to the metadata to make the model more discoverable.
- Including links to the official paper and GitHub repository.
- Adding a summary of the framework and its key highlights.
- Providing usage instructions for loading the checkpoints via the starVLA stack.
- Adding a BibTeX citation for researchers.

Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -1,5 +1,38 @@
1
  ---
2
  license: mit
 
3
  ---
4
 
5
- paper link: [arXiv:2605.13757](arxiv.org/abs/2605.13757)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: robotics
4
  ---
5
 
6
+ # FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
7
+
8
+ [**Paper**](https://huggingface.co/papers/2605.13757) | [**Code**](https://github.com/ZGC-EmbodyAI/FrameSkip) | [**Collection**](https://huggingface.co/collections/VLyb/frameskip)
9
+
10
+ **FrameSkip** is a training-time frame selection framework for Vision-Language-Action (VLA) models. Instead of treating every frame in a dense robot demonstration trajectory as equally useful supervision, FrameSkip scores trajectory frames with lightweight cues and trains primarily from fewer but more informative frames.
11
+
12
+ FrameSkip is designed as a data-layer intervention: it changes which frames are exposed during training while leaving the VLA architecture, action head, training objective, and inference procedure unchanged.
13
+
14
+ ## Highlights
15
+
16
+ - **Frame-level supervision allocation:** Addresses the temporal supervision imbalance where low-change segments often dominate training trajectories.
17
+ - **Architecture-agnostic:** Operates entirely in the dataloader, requiring no changes to the model architecture or inference.
18
+ - **Importance-guided retention:** Scores frames using action variation, visual-action coherence, task-progress priors, and gripper-transition preservation.
19
+ - **Improved Efficiency:** Achieves significantly higher success rates across benchmarks (RoboCasa-GR1, SimplerEnv, and LIBERO) while using as little as 20% of unique frames.
20
+
21
+ ## Usage
22
+
23
+ FrameSkip is built on the [starVLA](https://github.com/starVLA/starVLA) training and evaluation stack. The released checkpoints follow the standard starVLA checkpoint format and can be loaded in the same way as starVLA VLA policies.
24
+
25
+ For simulation evaluation, please refer to the model loading and evaluation workflow of the QwenGR00T architecture in starVLA, and replace the checkpoint path with the downloaded FrameSkip checkpoint.
26
+
27
+ ## Citation
28
+
29
+ If you find FrameSkip useful, please cite the paper:
30
+
31
+ ```bibtex
32
+ @article{yu2024frameskip,
33
+ title={FrameSkip: Learning from Fewer but More Informative Frames in VLA Training},
34
+ author={Bin Yu and Shijie Lian and Xiaopeng Lin and Zhaolong Shen and Yuliang Wei and Changti Wu and Hang Yuan and Haishan Liu and Bailing Wang and Cong Huang and Kai Chen},
35
+ journal={arXiv preprint arXiv:2605.13757},
36
+ year={2024}
37
+ }
38
+ ```