haofuly
/

capvector_models_collection

Model card Files Files and versions

capvector_models_collection / README.md

haofuly's picture

Add files using upload-large-folder tool

b23769d verified 3 days ago

|

2.83 kB

	# CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

	<div align="center">

	[![Paper](https://img.shields.io/badge/Paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](http://arxiv.org/abs/) [![Page](https://img.shields.io/badge/Project--Page-blue?style=for-the-badge&logo=homepage&logoColor=white)](https://capvector.github.io/) [![Hugging Face Collection](https://img.shields.io/badge/Models-fcd022?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/haofuly/capvector_models_collection)

	</div>

	CapVector is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.


	## 🌟 Key Features
	- Efficient downstream adaptation: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
	- Versatility: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
	- Generalization: CapVector is designed to transfer across tasks, environments, and robot embodiments.


	## 🚀 Get Started

	This repository provides two implementation paths:
	- [`capvector-oft/`](./capvector-oft) based implementation
	- [`capvector-pi05/`](./capvector-pi05) based implementation.

	Choose the subdirectory that matches your base model and training stack. Follow the subproject README for environment setup, data preparation, training, and inference.

	[`capvector-pi05/`](./capvector-pi05) provides the capability vector extraction and merging scripts.


	## 🌏 Contact
	For further discussion and collaboration, please feel free to contact us via Email and WeChat:

	\| Author \| Email \| WeChat \|
	\|:---:\|:---:\|:---:\|
	\| Wenxuan Song \| songwenxuan0115@gmail.com \| swx0757 \|


	## ❤️ Acknowledgments

	CapVector builds on and interfaces with several excellent open-source projects, including:

	- [OpenVLA-OFT](https://github.com/moojink/openvla-oft)
	- [OpenPI](https://github.com/Physical-Intelligence/openpi)


	## 🖊 Citation

	If you find this work useful, please cite:

	```bibtex
	@article{song2026capvector,
	title = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
	author = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
	journal = {Preprint},
	year = {2026}
	}
	```