File size: 3,684 Bytes
687d534
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: mit
datasets:
- Sylvest/LIBERO-plus
---

## 🌌 CorridorVLA

This repository provides the official implementation of **CorridorVLA**.

> **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors**

[![arXiv](https://img.shields.io/badge/arXiv-2604.21241-b31b1b.svg)](https://arxiv.org/abs/2604.21241)  
[![Code](https://img.shields.io/badge/Code-GitHub-black)](https://github.com/lidc54/corridorVLA)  
[![License](https://img.shields.io/badge/License-MIT-green.svg)](#)

---

## πŸ” TL;DR

* Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors**
* Predict sparse **end-effector Ξ”-positions**
* Use them to impose an **explicit corridor constraint** on action generation
* Achieves **83.21% success rate on LIBERO-Plus**

---

## 🧠 Motivation


<p align="center">
  <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%">
</p>

### Existing VLA paradigm

* Spatial guidance is encoded as visual-style tokens or latent features  
* Action generation is influenced indirectly through the backbone features  

### CorridorVLA

* Predict **compact physical quantities** (spatial anchors)  
* Apply them as **direct constraints in the loss**  
* No need for heavy visual intermediate representations  

---

## πŸ—οΈ Method Overview

<p align="center">
  <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%">
</p>

### Key components

**(1) Sparse Anchor Prediction**

* Predict $K$ future **Ξ”-position anchors**  
* Represent trajectory structure in a compact form  

**(2) Action Augmentation**

* Concatenate state-related physical quantities (e.g., Ξ”-positions) to the action vector  
* Enable joint prediction of state and action, providing implicit alignment between state space and action space  

**(3) Corridor Loss**

* Defines a tolerance region over the predicted trajectory  
* Penalizes deviations outside the region while allowing smooth convergence within it  

πŸ‘‰ Behaves like a **structured smooth-L1 in trajectory space**

---

## πŸ“Š Results

### LIBERO-Plus (GR00T-based)

| Variant | Description                       | AVG   |
|--------|----------------------------------|------|
| base   |                                  | 75.23 |
| c1     | query=3                          | 77.25 |
| c2     | + extra data                     | 77.25 |
| c3     | + Ξ”pos anchors                   | 79.21 |
| **c4** | + corridor loss (**CorridorVLA**) | **83.21** |

πŸ“ˆ Improvement:

* +7.98% over baselines  
* Largest gain from **explicit spatial constraint**

---

## βš™οΈ Implementation

* Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)**  
* Minimal changes:

  * few prediction slots  
  * loss terms  
* No heavy architecture redesign  


---

## πŸ“Œ Key Insights

* Spatial guidance can be:

  * **explicit (loss-level)** instead of implicit (feature-level)

* Physical quantities are:

  * more **action-aligned**  
  * more **interpretable**

* Simple constraints can:

  * significantly improve **stability**  
  * reduce **unstructured exploration**


---



## πŸ“– Citation

If you find this work useful, please cite:

```bibtex
@article{corridorvla2025,
  title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
  author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
  year={2026},
  eprint={2604.21241},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2604.21241}
}