haofuly's picture
Add files using upload-large-folder tool
b23769d verified
|
raw
history blame
2.83 kB

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

Paper Page Hugging Face Collection

CapVector is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.

🌟 Key Features

  • Efficient downstream adaptation: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
  • Versatility: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
  • Generalization: CapVector is designed to transfer across tasks, environments, and robot embodiments.

πŸš€ Get Started

This repository provides two implementation paths:

Choose the subdirectory that matches your base model and training stack. Follow the subproject README for environment setup, data preparation, training, and inference.

capvector-pi05/ provides the capability vector extraction and merging scripts.

🌏 Contact

For further discussion and collaboration, please feel free to contact us via Email and WeChat:

Author Email WeChat
Wenxuan Song songwenxuan0115@gmail.com swx0757

❀️ Acknowledgments

CapVector builds on and interfaces with several excellent open-source projects, including:

πŸ–Š Citation

If you find this work useful, please cite:

@article{song2026capvector,
  title   = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
  author  = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
  journal = {Preprint},
  year    = {2026}
}