Robotics
LeRobot
Safetensors

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach (TACO)

Paper Project Page GitHub

This repository contains the pi05 model finetuned on the dataset mixed over LIBERO-Spatial, LIBERO-Goal, LIBERO-Object and LIBERO-Long, together with our trained Coin Flipping Network (CFN), as described in the paper Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach.

TACO Architecture
TACO: Test-time Anti-exploration via pseudo-COunts

Overview

TACO (Test-time Anti-exploration via pseudo-COunts) is a novel test-time scaling framework for VLAs which retains the strong generalization capabilities of pretrained VLAs while effectively constraining outputs to the success modes of specific downstream tasks, performing the Anti-Exploration principle in offline RL. By leveraging a lightweight Coin Flipping Network (CFN), TACO acquires accurate measurement of distributional shift with minimal computational overhead, significantly improving performance on out-of-distribution testcases.

🎯 Key Features

  • Principled Anti-Exploration: Mitigates inference-time fragility by constraining generated actions to the "success modes" of the downstream task, effectively handling distribution shifts.
  • Universal Compatibility: Seamlessly integrates with Flow-Matching (e.g., $\pi_0$, $\pi_{0.5}$), Diffusion (e.g., RDT), and Autoregressive (e.g., OpenVLA) architectures.
  • Gradient-Free Steering: Performs Test-Time Scaling (TTS) via a generate-then-verify pipeline without modifying the heavy VLA backbone parameters.
  • Efficient Inference: Implements KV Cache Optimization to reuse visual-language representations, reducing inference latency by ~73% compared to the original manner, during parallel sampling.
  • High-Fidelity Verification: Utilizes a lightweight Coin Flipping Network (CFN) trained on internal representations with High-Fidelity Feature Search to accurately estimate action reliability.

πŸ† Main Contributions

  1. New Perspective on VLA Instability: We diagnose the inference fragility of generative VLAs as an out-of-support problem and propose TACO, the first framework to address this via the Anti-Exploration principle from Offline RL using Test-Time Scaling.
  2. Coupled Pseudo-Count Estimator: We introduce an efficient internal representation mechanism coupled with a High-Fidelity Feature Search strategy. This allows the CFN to accurately verify action chunks for denoising-based policies (Flow/Diffusion) that never see clean actions during training.
  3. SOTA Performance: Extensive experiments across extensive simulation tasks (RoboTwin, LIBERO, SimplerEnv) and real-world dual-arm manipulation demonstrate that TACO significantly boosts success rates (e.g., +16% in real-world tasks) over strong baselines like $\pi_0$.

πŸ› οΈ Installation

Create a conda env:

conda create -n taco python=3.10 -y
conda activate taco

Install torch (choose the version that suits your environment):

pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0

Install CFN:

cd cfn/
pip install -e .
cd ..

Install Pi0.5 (lerobot):

cd third_party/lerobot
pip install -e .

cd src/transformers
pip install -e .

cd ../../../..

πŸš€ Quick Start

We provide pre-trained base policies and CFN checkpoints on πŸ€— Hugging Face, allowing you to directly evaluate TACO without training.

Download Pre-trained Models

# Download base policy checkpoints & CFN checkpoints
### CFN state dict is saved at `cfns` sub-directory in the repo
hf download rhodes-team-teleai/pi05_TACO_libero_finetuned --local-dir /path/to/your/dir --max-workers 16

πŸ“Š Usage (Pi0.5 in Libero)

For full evaluation and training scripts, please refer to the official GitHub repository.

  1. Collect internal representation:
bash ./scripts/collect_inernal_representation/pi05_libero/collect.sh
  1. Train CFN:
bash ./scripts/train_cfn/train_cfn_example.sh
  1. Eval TACO:
bash ./scripts/eval/eval_libero_pi05_taco.sh

πŸ“„ License

This project is licensed under the Apache License 2.0.

πŸ™ Acknowledgments

  • LeRobot for the Pi0.5 implementation
  • OpenVLA for the OpenVLA base model
  • Robotwin for the simulation environment
  • Libero for the benchmark tasks

πŸŽ“ Citation

If you find TACO useful for your research, please cite our paper:

@article{yang2025taco,
  title={Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach},
  author={Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li},
  journal={arXiv preprint arXiv:2512.02834},
  year={2025}
}
Downloads last month
15
Safetensors
Model size
4B params
Tensor type
F32
Β·
BF16
Β·
Video Preview
loading

Dataset used to train rhodes-team-teleai/pi05_TACO_libero_finetuned

Collection including rhodes-team-teleai/pi05_TACO_libero_finetuned

Paper for rhodes-team-teleai/pi05_TACO_libero_finetuned