Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach (TACO)

This repository contains the pi05 model finetuned on the dataset mixed over LIBERO-Spatial, LIBERO-Goal, LIBERO-Object and LIBERO-Long, together with our trained Coin Flipping Network (CFN), as described in the paper Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach.

TACO: Test-time Anti-exploration via pseudo-COunts

Overview

TACO (Test-time Anti-exploration via pseudo-COunts) is a novel test-time scaling framework for VLAs which retains the strong generalization capabilities of pretrained VLAs while effectively constraining outputs to the success modes of specific downstream tasks, performing the Anti-Exploration principle in offline RL. By leveraging a lightweight Coin Flipping Network (CFN), TACO acquires accurate measurement of distributional shift with minimal computational overhead, significantly improving performance on out-of-distribution testcases.

🎯 Key Features

Principled Anti-Exploration: Mitigates inference-time fragility by constraining generated actions to the "success modes" of the downstream task, effectively handling distribution shifts.
Universal Compatibility: Seamlessly integrates with Flow-Matching (e.g., $\pi_0$, $\pi_{0.5}$), Diffusion (e.g., RDT), and Autoregressive (e.g., OpenVLA) architectures.
Gradient-Free Steering: Performs Test-Time Scaling (TTS) via a generate-then-verify pipeline without modifying the heavy VLA backbone parameters.
Efficient Inference: Implements KV Cache Optimization to reuse visual-language representations, reducing inference latency by ~73% compared to the original manner, during parallel sampling.
High-Fidelity Verification: Utilizes a lightweight Coin Flipping Network (CFN) trained on internal representations with High-Fidelity Feature Search to accurately estimate action reliability.

🏆 Main Contributions

New Perspective on VLA Instability: We diagnose the inference fragility of generative VLAs as an out-of-support problem and propose TACO, the first framework to address this via the Anti-Exploration principle from Offline RL using Test-Time Scaling.
Coupled Pseudo-Count Estimator: We introduce an efficient internal representation mechanism coupled with a High-Fidelity Feature Search strategy. This allows the CFN to accurately verify action chunks for denoising-based policies (Flow/Diffusion) that never see clean actions during training.
SOTA Performance: Extensive experiments across extensive simulation tasks (RoboTwin, LIBERO, SimplerEnv) and real-world dual-arm manipulation demonstrate that TACO significantly boosts success rates (e.g., +16% in real-world tasks) over strong baselines like $\pi_0$.

🛠️ Installation

Create a conda env:

conda create -n taco python=3.10 -y
conda activate taco

Install torch (choose the version that suits your environment):

pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0

Install CFN:

cd cfn/
pip install -e .
cd ..

Install Pi0.5 (lerobot):

cd third_party/lerobot
pip install -e .

cd src/transformers
pip install -e .

cd ../../../..

🚀 Quick Start

We provide pre-trained base policies and CFN checkpoints on 🤗 Hugging Face, allowing you to directly evaluate TACO without training.

Download Pre-trained Models

# Download base policy checkpoints & CFN checkpoints
### CFN state dict is saved at `cfns` sub-directory in the repo
hf download rhodes-team-teleai/pi05_TACO_libero_finetuned --local-dir /path/to/your/dir --max-workers 16

📊 Usage (Pi0.5 in Libero)

For full evaluation and training scripts, please refer to the official GitHub repository.

Collect internal representation:

bash ./scripts/collect_inernal_representation/pi05_libero/collect.sh

Train CFN:

bash ./scripts/train_cfn/train_cfn_example.sh

Eval TACO:

bash ./scripts/eval/eval_libero_pi05_taco.sh

📄 License

This project is licensed under the Apache License 2.0.

🙏 Acknowledgments

LeRobot for the Pi0.5 implementation
OpenVLA for the OpenVLA base model
Robotwin for the simulation environment
Libero for the benchmark tasks

🎓 Citation

If you find TACO useful for your research, please cite our paper:

@article{yang2025taco,
  title={Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach},
  author={Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li},
  journal={arXiv preprint arXiv:2512.02834},
  year={2025}
}