| --- |
| license: mit |
| datasets: |
| - Sylvest/LIBERO-plus |
| --- |
| |
| ## π CorridorVLA |
|
|
| This repository provides the official implementation of **CorridorVLA**. |
|
|
| > **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors** |
|
|
| [](https://arxiv.org/abs/2604.21241) |
| [](https://github.com/lidc54/corridorVLA) |
| [](#) |
|
|
| --- |
|
|
| ## π TL;DR |
|
|
| * Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors** |
| * Predict sparse **end-effector Ξ-positions** |
| * Use them to impose an **explicit corridor constraint** on action generation |
| * Achieves **83.21% success rate on LIBERO-Plus** |
|
|
| --- |
|
|
| ## π§ Motivation |
|
|
|
|
| <p align="center"> |
| <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%"> |
| </p> |
|
|
| ### Existing VLA paradigm |
|
|
| * Spatial guidance is encoded as visual-style tokens or latent features |
| * Action generation is influenced indirectly through the backbone features |
|
|
| ### CorridorVLA |
|
|
| * Predict **compact physical quantities** (spatial anchors) |
| * Apply them as **direct constraints in the loss** |
| * No need for heavy visual intermediate representations |
|
|
| --- |
|
|
| ## ποΈ Method Overview |
|
|
| <p align="center"> |
| <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%"> |
| </p> |
|
|
| ### Key components |
|
|
| **(1) Sparse Anchor Prediction** |
|
|
| * Predict $K$ future **Ξ-position anchors** |
| * Represent trajectory structure in a compact form |
|
|
| **(2) Action Augmentation** |
|
|
| * Concatenate state-related physical quantities (e.g., Ξ-positions) to the action vector |
| * Enable joint prediction of state and action, providing implicit alignment between state space and action space |
|
|
| **(3) Corridor Loss** |
|
|
| * Defines a tolerance region over the predicted trajectory |
| * Penalizes deviations outside the region while allowing smooth convergence within it |
|
|
| π Behaves like a **structured smooth-L1 in trajectory space** |
|
|
| --- |
|
|
| ## π Results |
|
|
| ### LIBERO-Plus (GR00T-based) |
|
|
| | Variant | Description | AVG | |
| |--------|----------------------------------|------| |
| | base | | 75.23 | |
| | c1 | query=3 | 77.25 | |
| | c2 | + extra data | 77.25 | |
| | c3 | + Ξpos anchors | 79.21 | |
| | **c4** | + corridor loss (**CorridorVLA**) | **83.21** | |
|
|
| π Improvement: |
|
|
| * +7.98% over baselines |
| * Largest gain from **explicit spatial constraint** |
|
|
| --- |
|
|
| ## βοΈ Implementation |
|
|
| * Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)** |
| * Minimal changes: |
|
|
| * few prediction slots |
| * loss terms |
| * No heavy architecture redesign |
|
|
|
|
| --- |
|
|
| ## π Key Insights |
|
|
| * Spatial guidance can be: |
|
|
| * **explicit (loss-level)** instead of implicit (feature-level) |
|
|
| * Physical quantities are: |
|
|
| * more **action-aligned** |
| * more **interpretable** |
|
|
| * Simple constraints can: |
|
|
| * significantly improve **stability** |
| * reduce **unstructured exploration** |
|
|
|
|
| --- |
|
|
|
|
|
|
| ## π Citation |
|
|
| If you find this work useful, please cite: |
|
|
| ```bibtex |
| @article{corridorvla2025, |
| title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors}, |
| author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li}, |
| year={2026}, |
| eprint={2604.21241}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.RO}, |
| url={https://arxiv.org/abs/2604.21241} |
| } |
| |