lidc
/

CorridorVLA

Model card Files Files and versions

xet

Community

lidc commited on 11 days ago

Commit

687d534

verified ·

1 Parent(s): 48f9dae

CorridorVLA

Browse files

Files changed (1) hide show

README.md +140 -3

README.md CHANGED Viewed

@@ -1,3 +1,140 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- Sylvest/LIBERO-plus
+---
+## 🌌 CorridorVLA
+This repository provides the official implementation of **CorridorVLA**.
+> **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors**
+[![arXiv](https://img.shields.io/badge/arXiv-2604.21241-b31b1b.svg)](https://arxiv.org/abs/2604.21241)
+[![Code](https://img.shields.io/badge/Code-GitHub-black)](https://github.com/lidc54/corridorVLA)
+[![License](https://img.shields.io/badge/License-MIT-green.svg)](#)
+---
+## 🔍 TL;DR
+* Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors**
+* Predict sparse **end-effector Δ-positions**
+* Use them to impose an **explicit corridor constraint** on action generation
+* Achieves **83.21% success rate on LIBERO-Plus**
+---
+## 🧠 Motivation
+<p align="center">
+  <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/motive.png" width="40%">
+</p>
+### Existing VLA paradigm
+* Spatial guidance is encoded as visual-style tokens or latent features
+* Action generation is influenced indirectly through the backbone features
+### CorridorVLA
+* Predict **compact physical quantities** (spatial anchors)
+* Apply them as **direct constraints in the loss**
+* No need for heavy visual intermediate representations
+---
+## 🏗️ Method Overview
+<p align="center">
+  <img src="https://raw.githubusercontent.com/lidc54/corridorVLA/main/assets/framework.png" width="50%">
+</p>
+### Key components
+**(1) Sparse Anchor Prediction**
+* Predict $K$ future **Δ-position anchors**
+* Represent trajectory structure in a compact form
+**(2) Action Augmentation**
+* Concatenate state-related physical quantities (e.g., Δ-positions) to the action vector
+* Enable joint prediction of state and action, providing implicit alignment between state space and action space
+**(3) Corridor Loss**
+* Defines a tolerance region over the predicted trajectory
+* Penalizes deviations outside the region while allowing smooth convergence within it
+👉 Behaves like a **structured smooth-L1 in trajectory space**
+---
+## 📊 Results
+### LIBERO-Plus (GR00T-based)
+| Variant | Description                       | AVG   |
+|--------|----------------------------------|------|
+| base   |                                  | 75.23 |
+| c1     | query=3                          | 77.25 |
+| c2     | + extra data                     | 77.25 |
+| c3     | + Δpos anchors                   | 79.21 |
+| **c4** | + corridor loss (**CorridorVLA**) | **83.21** |
+📈 Improvement:
+* +7.98% over baselines
+* Largest gain from **explicit spatial constraint**
+---
+## ⚙️ Implementation
+* Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)**
+* Minimal changes:
+  * few prediction slots
+  * loss terms
+* No heavy architecture redesign
+---
+## 📌 Key Insights
+* Spatial guidance can be:
+  * **explicit (loss-level)** instead of implicit (feature-level)
+* Physical quantities are:
+  * more **action-aligned**
+  * more **interpretable**
+* Simple constraints can:
+  * significantly improve **stability**
+  * reduce **unstructured exploration**
+---
+## 📖 Citation
+If you find this work useful, please cite:
+```bibtex
+@article{corridorvla2025,
+  title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
+  author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
+  year={2026},
+  eprint={2604.21241},
+  archivePrefix={arXiv},
+  primaryClass={cs.RO},
+  url={https://arxiv.org/abs/2604.21241}
+}