Improve model card and add robotics metadata
Browse filesHi! I'm Niels, part of the community science team at Hugging Face. I've opened this PR to improve the model card for FrameSkip.
The improvements include:
- Adding the `pipeline_tag: robotics` to the metadata to make the model more discoverable.
- Including links to the official paper and GitHub repository.
- Adding a summary of the framework and its key highlights.
- Providing usage instructions for loading the checkpoints via the starVLA stack.
- Adding a BibTeX citation for researchers.
README.md
CHANGED
|
@@ -1,5 +1,38 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: robotics
|
| 4 |
---
|
| 5 |
|
| 6 |
+
# FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
|
| 7 |
+
|
| 8 |
+
[**Paper**](https://huggingface.co/papers/2605.13757) | [**Code**](https://github.com/ZGC-EmbodyAI/FrameSkip) | [**Collection**](https://huggingface.co/collections/VLyb/frameskip)
|
| 9 |
+
|
| 10 |
+
**FrameSkip** is a training-time frame selection framework for Vision-Language-Action (VLA) models. Instead of treating every frame in a dense robot demonstration trajectory as equally useful supervision, FrameSkip scores trajectory frames with lightweight cues and trains primarily from fewer but more informative frames.
|
| 11 |
+
|
| 12 |
+
FrameSkip is designed as a data-layer intervention: it changes which frames are exposed during training while leaving the VLA architecture, action head, training objective, and inference procedure unchanged.
|
| 13 |
+
|
| 14 |
+
## Highlights
|
| 15 |
+
|
| 16 |
+
- **Frame-level supervision allocation:** Addresses the temporal supervision imbalance where low-change segments often dominate training trajectories.
|
| 17 |
+
- **Architecture-agnostic:** Operates entirely in the dataloader, requiring no changes to the model architecture or inference.
|
| 18 |
+
- **Importance-guided retention:** Scores frames using action variation, visual-action coherence, task-progress priors, and gripper-transition preservation.
|
| 19 |
+
- **Improved Efficiency:** Achieves significantly higher success rates across benchmarks (RoboCasa-GR1, SimplerEnv, and LIBERO) while using as little as 20% of unique frames.
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
FrameSkip is built on the [starVLA](https://github.com/starVLA/starVLA) training and evaluation stack. The released checkpoints follow the standard starVLA checkpoint format and can be loaded in the same way as starVLA VLA policies.
|
| 24 |
+
|
| 25 |
+
For simulation evaluation, please refer to the model loading and evaluation workflow of the QwenGR00T architecture in starVLA, and replace the checkpoint path with the downloaded FrameSkip checkpoint.
|
| 26 |
+
|
| 27 |
+
## Citation
|
| 28 |
+
|
| 29 |
+
If you find FrameSkip useful, please cite the paper:
|
| 30 |
+
|
| 31 |
+
```bibtex
|
| 32 |
+
@article{yu2024frameskip,
|
| 33 |
+
title={FrameSkip: Learning from Fewer but More Informative Frames in VLA Training},
|
| 34 |
+
author={Bin Yu and Shijie Lian and Xiaopeng Lin and Zhaolong Shen and Yuliang Wei and Changti Wu and Hang Yuan and Haishan Liu and Bailing Wang and Cong Huang and Kai Chen},
|
| 35 |
+
journal={arXiv preprint arXiv:2605.13757},
|
| 36 |
+
year={2024}
|
| 37 |
+
}
|
| 38 |
+
```
|