Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach (TACO)
This repository contains the pi05 model finetuned on the dataset mixed over LIBERO-Spatial, LIBERO-Goal, LIBERO-Object and LIBERO-Long, together with our trained Coin Flipping Network (CFN), as described in the paper Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach.
TACO: Test-time Anti-exploration via pseudo-COunts
Overview
TACO (Test-time Anti-exploration via pseudo-COunts) is a novel test-time scaling framework for VLAs which retains the strong generalization capabilities of pretrained VLAs while effectively constraining outputs to the success modes of specific downstream tasks, performing the Anti-Exploration principle in offline RL. By leveraging a lightweight Coin Flipping Network (CFN), TACO acquires accurate measurement of distributional shift with minimal computational overhead, significantly improving performance on out-of-distribution testcases.
π― Key Features
- Principled Anti-Exploration: Mitigates inference-time fragility by constraining generated actions to the "success modes" of the downstream task, effectively handling distribution shifts.
- Universal Compatibility: Seamlessly integrates with Flow-Matching (e.g., $\pi_0$, $\pi_{0.5}$), Diffusion (e.g., RDT), and Autoregressive (e.g., OpenVLA) architectures.
- Gradient-Free Steering: Performs Test-Time Scaling (TTS) via a generate-then-verify pipeline without modifying the heavy VLA backbone parameters.
- Efficient Inference: Implements KV Cache Optimization to reuse visual-language representations, reducing inference latency by ~73% compared to the original manner, during parallel sampling.
- High-Fidelity Verification: Utilizes a lightweight Coin Flipping Network (CFN) trained on internal representations with High-Fidelity Feature Search to accurately estimate action reliability.
π Main Contributions
- New Perspective on VLA Instability: We diagnose the inference fragility of generative VLAs as an out-of-support problem and propose TACO, the first framework to address this via the Anti-Exploration principle from Offline RL using Test-Time Scaling.
- Coupled Pseudo-Count Estimator: We introduce an efficient internal representation mechanism coupled with a High-Fidelity Feature Search strategy. This allows the CFN to accurately verify action chunks for denoising-based policies (Flow/Diffusion) that never see clean actions during training.
- SOTA Performance: Extensive experiments across extensive simulation tasks (RoboTwin, LIBERO, SimplerEnv) and real-world dual-arm manipulation demonstrate that TACO significantly boosts success rates (e.g., +16% in real-world tasks) over strong baselines like $\pi_0$.
π οΈ Installation
Create a conda env:
conda create -n taco python=3.10 -y
conda activate taco
Install torch (choose the version that suits your environment):
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0
Install CFN:
cd cfn/
pip install -e .
cd ..
Install Pi0.5 (lerobot):
cd third_party/lerobot
pip install -e .
cd src/transformers
pip install -e .
cd ../../../..
π Quick Start
We provide pre-trained base policies and CFN checkpoints on π€ Hugging Face, allowing you to directly evaluate TACO without training.
Download Pre-trained Models
# Download base policy checkpoints & CFN checkpoints
### CFN state dict is saved at `cfns` sub-directory in the repo
hf download rhodes-team-teleai/pi05_TACO_libero_finetuned --local-dir /path/to/your/dir --max-workers 16
π Usage (Pi0.5 in Libero)
For full evaluation and training scripts, please refer to the official GitHub repository.
- Collect internal representation:
bash ./scripts/collect_inernal_representation/pi05_libero/collect.sh
- Train CFN:
bash ./scripts/train_cfn/train_cfn_example.sh
- Eval TACO:
bash ./scripts/eval/eval_libero_pi05_taco.sh
π License
This project is licensed under the Apache License 2.0.
π Acknowledgments
- LeRobot for the Pi0.5 implementation
- OpenVLA for the OpenVLA base model
- Robotwin for the simulation environment
- Libero for the benchmark tasks
π Citation
If you find TACO useful for your research, please cite our paper:
@article{yang2025taco,
title={Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach},
author={Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li},
journal={arXiv preprint arXiv:2512.02834},
year={2025}
}
- Downloads last month
- 15