| # CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models | |
| <div align="center"> | |
| [](http://arxiv.org/abs/) [](https://capvector.github.io/) [](https://huggingface.co/haofuly/capvector_models_collection) | |
| </div> | |
| CapVector is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability. | |
| ## π Key Features | |
| - **Efficient downstream adaptation**: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT. | |
| - **Versatility**: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones. | |
| - **Generalization**: CapVector is designed to transfer across tasks, environments, and robot embodiments. | |
| ## π Get Started | |
| This repository provides two implementation paths: | |
| - [`capvector-oft/`](./capvector-oft) based implementation | |
| - [`capvector-pi05/`](./capvector-pi05) based implementation. | |
| Choose the subdirectory that matches your base model and training stack. Follow the subproject README for environment setup, data preparation, training, and inference. | |
| [`capvector-pi05/`](./capvector-pi05) provides the capability vector extraction and merging scripts. | |
| ## π Contact | |
| For further discussion and collaboration, please feel free to contact us via Email and WeChat: | |
| | Author | Email | WeChat | | |
| |:---:|:---:|:---:| | |
| | Wenxuan Song | songwenxuan0115@gmail.com | swx0757 | | |
| ## β€οΈ Acknowledgments | |
| CapVector builds on and interfaces with several excellent open-source projects, including: | |
| - [OpenVLA-OFT](https://github.com/moojink/openvla-oft) | |
| - [OpenPI](https://github.com/Physical-Intelligence/openpi) | |
| ## π Citation | |
| If you find this work useful, please cite: | |
| ```bibtex | |
| @article{song2026capvector, | |
| title = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models}, | |
| author = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang}, | |
| journal = {Preprint}, | |
| year = {2026} | |
| } | |
| ``` | |