lidc commited on
Commit
687d534
Β·
verified Β·
1 Parent(s): 48f9dae

CorridorVLA

Browse files
Files changed (1) hide show
  1. README.md +140 -3
README.md CHANGED
@@ -1,3 +1,140 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Sylvest/LIBERO-plus
5
+ ---
6
+
7
+ ## 🌌 CorridorVLA
8
+
9
+ This repository provides the official implementation of **CorridorVLA**.
10
+
11
+ > **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors**
12
+
13
+ [![arXiv](https://img.shields.io/badge/arXiv-2604.21241-b31b1b.svg)](https://arxiv.org/abs/2604.21241)
14
+ [![Code](https://img.shields.io/badge/Code-GitHub-black)](https://github.com/lidc54/corridorVLA)
15
+ [![License](https://img.shields.io/badge/License-MIT-green.svg)](#)
16
+
17
+ ---
18
+
19
+ ## πŸ” TL;DR
20
+
21
+ * Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors**
22
+ * Predict sparse **end-effector Ξ”-positions**
23
+ * Use them to impose an **explicit corridor constraint** on action generation
24
+ * Achieves **83.21% success rate on LIBERO-Plus**
25
+
26
+ ---
27
+
28
+ ## 🧠 Motivation
29
+
30
+
31
+ <p align="center">
32
+ <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%">
33
+ </p>
34
+
35
+ ### Existing VLA paradigm
36
+
37
+ * Spatial guidance is encoded as visual-style tokens or latent features
38
+ * Action generation is influenced indirectly through the backbone features
39
+
40
+ ### CorridorVLA
41
+
42
+ * Predict **compact physical quantities** (spatial anchors)
43
+ * Apply them as **direct constraints in the loss**
44
+ * No need for heavy visual intermediate representations
45
+
46
+ ---
47
+
48
+ ## πŸ—οΈ Method Overview
49
+
50
+ <p align="center">
51
+ <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%">
52
+ </p>
53
+
54
+ ### Key components
55
+
56
+ **(1) Sparse Anchor Prediction**
57
+
58
+ * Predict $K$ future **Ξ”-position anchors**
59
+ * Represent trajectory structure in a compact form
60
+
61
+ **(2) Action Augmentation**
62
+
63
+ * Concatenate state-related physical quantities (e.g., Ξ”-positions) to the action vector
64
+ * Enable joint prediction of state and action, providing implicit alignment between state space and action space
65
+
66
+ **(3) Corridor Loss**
67
+
68
+ * Defines a tolerance region over the predicted trajectory
69
+ * Penalizes deviations outside the region while allowing smooth convergence within it
70
+
71
+ πŸ‘‰ Behaves like a **structured smooth-L1 in trajectory space**
72
+
73
+ ---
74
+
75
+ ## πŸ“Š Results
76
+
77
+ ### LIBERO-Plus (GR00T-based)
78
+
79
+ | Variant | Description | AVG |
80
+ |--------|----------------------------------|------|
81
+ | base | | 75.23 |
82
+ | c1 | query=3 | 77.25 |
83
+ | c2 | + extra data | 77.25 |
84
+ | c3 | + Ξ”pos anchors | 79.21 |
85
+ | **c4** | + corridor loss (**CorridorVLA**) | **83.21** |
86
+
87
+ πŸ“ˆ Improvement:
88
+
89
+ * +7.98% over baselines
90
+ * Largest gain from **explicit spatial constraint**
91
+
92
+ ---
93
+
94
+ ## βš™οΈ Implementation
95
+
96
+ * Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)**
97
+ * Minimal changes:
98
+
99
+ * few prediction slots
100
+ * loss terms
101
+ * No heavy architecture redesign
102
+
103
+
104
+ ---
105
+
106
+ ## πŸ“Œ Key Insights
107
+
108
+ * Spatial guidance can be:
109
+
110
+ * **explicit (loss-level)** instead of implicit (feature-level)
111
+
112
+ * Physical quantities are:
113
+
114
+ * more **action-aligned**
115
+ * more **interpretable**
116
+
117
+ * Simple constraints can:
118
+
119
+ * significantly improve **stability**
120
+ * reduce **unstructured exploration**
121
+
122
+
123
+ ---
124
+
125
+
126
+
127
+ ## πŸ“– Citation
128
+
129
+ If you find this work useful, please cite:
130
+
131
+ ```bibtex
132
+ @article{corridorvla2025,
133
+ title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
134
+ author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
135
+ year={2026},
136
+ eprint={2604.21241},
137
+ archivePrefix={arXiv},
138
+ primaryClass={cs.RO},
139
+ url={https://arxiv.org/abs/2604.21241}
140
+ }