pi05-build-block-tower-rlt-6mix

RL Token (RLT) encoder-decoder trained on the 6-dataset build-block-tower mixture, on top of the published pi05-build-block-tower-6mix VLA baseline.

What is this?

This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA's final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See Xu et al. (2026), Precise Manipulation with Efficient Online RL for the method.

Training

Config: pi05_rlt_build_block_tower_6mix
VLA backbone: pravsels/pi05-build-block-tower-6mix step 49999 (frozen, rl_vla_loss_weight=0.0)
Encoder-decoder: 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
Dataset: 6 LeRobot v2.1 datasets (build_block_tower + dAgger 1.0.0–1.4.0)
Batch size: 36
LR: 5e-5 cosine (1k warmup)
Steps: 20,000
Runtime: 7h19m on 4x GH200 (Isambard)

Loss progression

Step	Train Loss	Val Loss
0	—	9262.3
1,000	715.6	1669.9
5,000	424.0	485.0
10,000	297.2	352.4
15,000	240.8	313.7
19,900	208.4	302.1

Loss decreased steadily throughout training with no instability. Val loss tracked train loss with a healthy gap.

Checkpoints

Step	Train Loss	Val Loss	Params SHA256
5000	424.0	485.0	`3eb482d0a6b9ccd97e2926eb93baed79f3e5bb27c4fefbab1097504b43f3154a`
10000	297.2	352.4	`1f0ccb0b412a6b2a79b706256d7e07cbd7c0418f1c5d494b8188f83a8e2a00bc`
15000	240.8	313.7	`359c987e049518df918009f55a7dc853ce05897480edd67ec646d266c5311d3b`
19999	208.4	302.1	`34ae52b04c900399836a94bdeeea576fb151395aa5fc82a5f356f8a708c3e73e`

Verifying checkpoint hashes

cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum

Repo layout

assets/                      # Norm stats, valid indices
checkpoints/5000/params/     # Step 5000 model weights
checkpoints/10000/params/    # Step 10000 model weights
checkpoints/15000/params/    # Step 15000 model weights
checkpoints/19999/params/    # Step 19999 model weights (final)
TRAINING_LOG.md              # Training log

W&B

Training curves: https://wandb.ai/pravsels/openpi-rlt-block-tower/runs/xanf5muf

Usage

import openpi.models.model as _model
import openpi.training.config as _config

config = _config.get_config("pi05_rlt_build_block_tower_6mix")
params = _model.restore_params("checkpoints/19999/params", restore_type=np.ndarray)
model = config.model.load(params)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics