--- license: mit datasets: - Sylvest/LIBERO-plus --- ## 🌌 CorridorVLA This repository provides the official implementation of **CorridorVLA**. > **Direct spatial constraints for Vision-Language-Action models via sparse physical anchors** [![arXiv](https://img.shields.io/badge/arXiv-2604.21241-b31b1b.svg)](https://arxiv.org/abs/2604.21241) [![Code](https://img.shields.io/badge/Code-GitHub-black)](https://github.com/lidc54/corridorVLA) [![License](https://img.shields.io/badge/License-MIT-green.svg)](#) --- ## 🔍 TL;DR * Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using **text-style physical anchors** * Predict sparse **end-effector Δ-positions** * Use them to impose an **explicit corridor constraint** on action generation * Achieves **83.21% success rate on LIBERO-Plus** --- ## 🧠 Motivation

### Existing VLA paradigm * Spatial guidance is encoded as visual-style tokens or latent features * Action generation is influenced indirectly through the backbone features ### CorridorVLA * Predict **compact physical quantities** (spatial anchors) * Apply them as **direct constraints in the loss** * No need for heavy visual intermediate representations --- ## 🏗️ Method Overview

### Key components **(1) Sparse Anchor Prediction** * Predict $K$ future **Δ-position anchors** * Represent trajectory structure in a compact form **(2) Action Augmentation** * Concatenate state-related physical quantities (e.g., Δ-positions) to the action vector * Enable joint prediction of state and action, providing implicit alignment between state space and action space **(3) Corridor Loss** * Defines a tolerance region over the predicted trajectory * Penalizes deviations outside the region while allowing smooth convergence within it 👉 Behaves like a **structured smooth-L1 in trajectory space** --- ## 📊 Results ### LIBERO-Plus (GR00T-based) | Variant | Description | AVG | |--------|----------------------------------|------| | base | | 75.23 | | c1 | query=3 | 77.25 | | c2 | + extra data | 77.25 | | c3 | + Δpos anchors | 79.21 | | **c4** | + corridor loss (**CorridorVLA**) | **83.21** | 📈 Improvement: * +7.98% over baselines * Largest gain from **explicit spatial constraint** --- ## ⚙️ Implementation * Built on **[StarVLA](https://github.com/starVLA/starVLA/commit/e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc)** * Minimal changes: * few prediction slots * loss terms * No heavy architecture redesign --- ## 📌 Key Insights * Spatial guidance can be: * **explicit (loss-level)** instead of implicit (feature-level) * Physical quantities are: * more **action-aligned** * more **interpretable** * Simple constraints can: * significantly improve **stability** * reduce **unstructured exploration** --- ## 📖 Citation If you find this work useful, please cite: ```bibtex @article{corridorvla2025, title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors}, author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li}, year={2026}, eprint={2604.21241}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2604.21241} }