This model accompanies our work on Developing Vision-Language-Action Model from Egocentric Videos.

🌐 Project page: https://biscue5.github.io/egovla-project-page/
📄 Paper: Developing Vision-Language-Action Model from Egocentric Videos (arXiv:2509.21986)
🧰 Format: LeRobot v2.0
🪪 License: Apache-2.0

Citation

If you use this dataset, please cite:

@article{yoshida2025developing,
  title   = {Developing Vision-Language-Action Model from Egocentric Videos},
  author  = {Yoshida, Tomoya and Kurita, Shuhei and Nishimura, Taichi and Mori, Shinsuke},
  journal = {arXiv preprint arXiv:2509.21986},
  year    = {2025}
}

Downloads last month: 12

Safetensors

Model size

4B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Biscue5/pi0-egoscaler-v2

Paper for Biscue5/pi0-egoscaler-v2

Developing Vision-Language-Action Model from Egocentric Videos

Paper • 2509.21986 • Published Sep 26, 2025