pi05-build-block-tower-rlt-6mix

RL Token (RLT) encoder-decoder trained on the 6-dataset build-block-tower mixture, on top of the published pi05-build-block-tower-6mix VLA baseline.

What is this?

This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA's final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See Xu et al. (2026), Precise Manipulation with Efficient Online RL for the method.

Training

  • Config: pi05_rlt_build_block_tower_6mix
  • VLA backbone: pravsels/pi05-build-block-tower-6mix step 49999 (frozen, rl_vla_loss_weight=0.0)
  • Encoder-decoder: 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
  • Dataset: 6 LeRobot v2.1 datasets (build_block_tower + dAgger 1.0.0–1.4.0)
  • Batch size: 36
  • LR: 5e-5 cosine (1k warmup)
  • Steps: 20,000
  • Runtime: 7h19m on 4x GH200 (Isambard)

Loss progression

Step Train Loss Val Loss
0 9262.3
1,000 715.6 1669.9
5,000 424.0 485.0
10,000 297.2 352.4
15,000 240.8 313.7
19,900 208.4 302.1

Loss decreased steadily throughout training with no instability. Val loss tracked train loss with a healthy gap.

Checkpoints

Step Train Loss Val Loss Params SHA256
5000 424.0 485.0 3eb482d0a6b9ccd97e2926eb93baed79f3e5bb27c4fefbab1097504b43f3154a
10000 297.2 352.4 1f0ccb0b412a6b2a79b706256d7e07cbd7c0418f1c5d494b8188f83a8e2a00bc
15000 240.8 313.7 359c987e049518df918009f55a7dc853ce05897480edd67ec646d266c5311d3b
19999 208.4 302.1 34ae52b04c900399836a94bdeeea576fb151395aa5fc82a5f356f8a708c3e73e

Verifying checkpoint hashes

cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum

Repo layout

assets/                      # Norm stats, valid indices
checkpoints/5000/params/     # Step 5000 model weights
checkpoints/10000/params/    # Step 10000 model weights
checkpoints/15000/params/    # Step 15000 model weights
checkpoints/19999/params/    # Step 19999 model weights (final)
TRAINING_LOG.md              # Training log

W&B

Training curves: https://wandb.ai/pravsels/openpi-rlt-block-tower/runs/xanf5muf

Usage

import openpi.models.model as _model
import openpi.training.config as _config

config = _config.get_config("pi05_rlt_build_block_tower_6mix")
params = _model.restore_params("checkpoints/19999/params", restore_type=np.ndarray)
model = config.model.load(params)
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading