CorridorVLA / README.md
lidc's picture
CorridorVLA
687d534 verified
---
license: mit
datasets:
- Sylvest/LIBERO-plus
---
## 🌌 CorridorVLA
This repository provides the official implementation of **CorridorVLA**.
> **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors**
[![arXiv](https://img.shields.io/badge/arXiv-2604.21241-b31b1b.svg)](https://arxiv.org/abs/2604.21241)
[![Code](https://img.shields.io/badge/Code-GitHub-black)](https://github.com/lidc54/corridorVLA)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](#)
---
## πŸ” TL;DR
* Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors**
* Predict sparse **end-effector Ξ”-positions**
* Use them to impose an **explicit corridor constraint** on action generation
* Achieves **83.21% success rate on LIBERO-Plus**
---
## 🧠 Motivation
<p align="center">
<img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%">
</p>
### Existing VLA paradigm
* Spatial guidance is encoded as visual-style tokens or latent features
* Action generation is influenced indirectly through the backbone features
### CorridorVLA
* Predict **compact physical quantities** (spatial anchors)
* Apply them as **direct constraints in the loss**
* No need for heavy visual intermediate representations
---
## πŸ—οΈ Method Overview
<p align="center">
<img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%">
</p>
### Key components
**(1) Sparse Anchor Prediction**
* Predict $K$ future **Ξ”-position anchors**
* Represent trajectory structure in a compact form
**(2) Action Augmentation**
* Concatenate state-related physical quantities (e.g., Ξ”-positions) to the action vector
* Enable joint prediction of state and action, providing implicit alignment between state space and action space
**(3) Corridor Loss**
* Defines a tolerance region over the predicted trajectory
* Penalizes deviations outside the region while allowing smooth convergence within it
πŸ‘‰ Behaves like a **structured smooth-L1 in trajectory space**
---
## πŸ“Š Results
### LIBERO-Plus (GR00T-based)
| Variant | Description | AVG |
|--------|----------------------------------|------|
| base | | 75.23 |
| c1 | query=3 | 77.25 |
| c2 | + extra data | 77.25 |
| c3 | + Ξ”pos anchors | 79.21 |
| **c4** | + corridor loss (**CorridorVLA**) | **83.21** |
πŸ“ˆ Improvement:
* +7.98% over baselines
* Largest gain from **explicit spatial constraint**
---
## βš™οΈ Implementation
* Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)**
* Minimal changes:
* few prediction slots
* loss terms
* No heavy architecture redesign
---
## πŸ“Œ Key Insights
* Spatial guidance can be:
* **explicit (loss-level)** instead of implicit (feature-level)
* Physical quantities are:
* more **action-aligned**
* more **interpretable**
* Simple constraints can:
* significantly improve **stability**
* reduce **unstructured exploration**
---
## πŸ“– Citation
If you find this work useful, please cite:
```bibtex
@article{corridorvla2025,
title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
year={2026},
eprint={2604.21241},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2604.21241}
}