File size: 3,684 Bytes

687d534

---
license: mit
datasets:
- Sylvest/LIBERO-plus
---

## 🌌 CorridorVLA

This repository provides the official implementation of **CorridorVLA**.

> **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors**

[![arXiv](https://img.shields.io/badge/arXiv-2604.21241-b31b1b.svg)](https://arxiv.org/abs/2604.21241)  
[![Code](https://img.shields.io/badge/Code-GitHub-black)](https://github.com/lidc54/corridorVLA)  
[![License](https://img.shields.io/badge/License-MIT-green.svg)](#)

---

## 🔍 TL;DR

* Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors**
* Predict sparse **end-effector Δ-positions**
* Use them to impose an **explicit corridor constraint** on action generation
* Achieves **83.21% success rate on LIBERO-Plus**

---

## 🧠 Motivation


<p align="center">
  <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%">
</p>

### Existing VLA paradigm

* Spatial guidance is encoded as visual-style tokens or latent features  
* Action generation is influenced indirectly through the backbone features  

### CorridorVLA

* Predict **compact physical quantities** (spatial anchors)  
* Apply them as **direct constraints in the loss**  
* No need for heavy visual intermediate representations  

---

## 🏗️ Method Overview

<p align="center">
  <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%">
</p>

### Key components

**(1) Sparse Anchor Prediction**

* Predict $K$ future **Δ-position anchors**  
* Represent trajectory structure in a compact form  

**(2) Action Augmentation**

* Concatenate state-related physical quantities (e.g., Δ-positions) to the action vector  
* Enable joint prediction of state and action, providing implicit alignment between state space and action space  

**(3) Corridor Loss**

* Defines a tolerance region over the predicted trajectory  
* Penalizes deviations outside the region while allowing smooth convergence within it  

👉 Behaves like a **structured smooth-L1 in trajectory space**

---

## 📊 Results

### LIBERO-Plus (GR00T-based)

| Variant | Description                       | AVG   |
|--------|----------------------------------|------|
| base   |                                  | 75.23 |
| c1     | query=3                          | 77.25 |
| c2     | + extra data                     | 77.25 |
| c3     | + Δpos anchors                   | 79.21 |
| **c4** | + corridor loss (**CorridorVLA**) | **83.21** |

📈 Improvement:

* +7.98% over baselines  
* Largest gain from **explicit spatial constraint**

---

## ⚙️ Implementation

* Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)**  
* Minimal changes:

  * few prediction slots  
  * loss terms  
* No heavy architecture redesign  


---

## 📌 Key Insights

* Spatial guidance can be:

  * **explicit (loss-level)** instead of implicit (feature-level)

* Physical quantities are:

  * more **action-aligned**  
  * more **interpretable**

* Simple constraints can:

  * significantly improve **stability**  
  * reduce **unstructured exploration**


---



## 📖 Citation

If you find this work useful, please cite:

```bibtex
@article{corridorvla2025,
  title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
  author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
  year={2026},
  eprint={2604.21241},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2604.21241}
}