File size: 3,684 Bytes
687d534 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
license: mit
datasets:
- Sylvest/LIBERO-plus
---
## π CorridorVLA
This repository provides the official implementation of **CorridorVLA**.
> **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors**
[](https://arxiv.org/abs/2604.21241)
[](https://github.com/lidc54/corridorVLA)
[](#)
---
## π TL;DR
* Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors**
* Predict sparse **end-effector Ξ-positions**
* Use them to impose an **explicit corridor constraint** on action generation
* Achieves **83.21% success rate on LIBERO-Plus**
---
## π§ Motivation
<p align="center">
<img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%">
</p>
### Existing VLA paradigm
* Spatial guidance is encoded as visual-style tokens or latent features
* Action generation is influenced indirectly through the backbone features
### CorridorVLA
* Predict **compact physical quantities** (spatial anchors)
* Apply them as **direct constraints in the loss**
* No need for heavy visual intermediate representations
---
## ποΈ Method Overview
<p align="center">
<img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%">
</p>
### Key components
**(1) Sparse Anchor Prediction**
* Predict $K$ future **Ξ-position anchors**
* Represent trajectory structure in a compact form
**(2) Action Augmentation**
* Concatenate state-related physical quantities (e.g., Ξ-positions) to the action vector
* Enable joint prediction of state and action, providing implicit alignment between state space and action space
**(3) Corridor Loss**
* Defines a tolerance region over the predicted trajectory
* Penalizes deviations outside the region while allowing smooth convergence within it
π Behaves like a **structured smooth-L1 in trajectory space**
---
## π Results
### LIBERO-Plus (GR00T-based)
| Variant | Description | AVG |
|--------|----------------------------------|------|
| base | | 75.23 |
| c1 | query=3 | 77.25 |
| c2 | + extra data | 77.25 |
| c3 | + Ξpos anchors | 79.21 |
| **c4** | + corridor loss (**CorridorVLA**) | **83.21** |
π Improvement:
* +7.98% over baselines
* Largest gain from **explicit spatial constraint**
---
## βοΈ Implementation
* Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)**
* Minimal changes:
* few prediction slots
* loss terms
* No heavy architecture redesign
---
## π Key Insights
* Spatial guidance can be:
* **explicit (loss-level)** instead of implicit (feature-level)
* Physical quantities are:
* more **action-aligned**
* more **interpretable**
* Simple constraints can:
* significantly improve **stability**
* reduce **unstructured exploration**
---
## π Citation
If you find this work useful, please cite:
```bibtex
@article{corridorvla2025,
title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
year={2026},
eprint={2604.21241},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2604.21241}
}
|