pi05-build-block-tower-rlt-6mix
RL Token (RLT) encoder-decoder trained on the 6-dataset build-block-tower mixture, on top of the published pi05-build-block-tower-6mix VLA baseline.
What is this?
This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA's final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See Xu et al. (2026), Precise Manipulation with Efficient Online RL for the method.
Training
- Config:
pi05_rlt_build_block_tower_6mix - VLA backbone:
pravsels/pi05-build-block-tower-6mixstep 49999 (frozen,rl_vla_loss_weight=0.0) - Encoder-decoder: 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
- Dataset: 6 LeRobot v2.1 datasets (build_block_tower + dAgger 1.0.0–1.4.0)
- Batch size: 36
- LR: 5e-5 cosine (1k warmup)
- Steps: 20,000
- Runtime: 7h19m on 4x GH200 (Isambard)
Loss progression
| Step | Train Loss | Val Loss |
|---|---|---|
| 0 | — | 9262.3 |
| 1,000 | 715.6 | 1669.9 |
| 5,000 | 424.0 | 485.0 |
| 10,000 | 297.2 | 352.4 |
| 15,000 | 240.8 | 313.7 |
| 19,900 | 208.4 | 302.1 |
Loss decreased steadily throughout training with no instability. Val loss tracked train loss with a healthy gap.
Checkpoints
| Step | Train Loss | Val Loss | Params SHA256 |
|---|---|---|---|
| 5000 | 424.0 | 485.0 | 3eb482d0a6b9ccd97e2926eb93baed79f3e5bb27c4fefbab1097504b43f3154a |
| 10000 | 297.2 | 352.4 | 1f0ccb0b412a6b2a79b706256d7e07cbd7c0418f1c5d494b8188f83a8e2a00bc |
| 15000 | 240.8 | 313.7 | 359c987e049518df918009f55a7dc853ce05897480edd67ec646d266c5311d3b |
| 19999 | 208.4 | 302.1 | 34ae52b04c900399836a94bdeeea576fb151395aa5fc82a5f356f8a708c3e73e |
Verifying checkpoint hashes
cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum
Repo layout
assets/ # Norm stats, valid indices
checkpoints/5000/params/ # Step 5000 model weights
checkpoints/10000/params/ # Step 10000 model weights
checkpoints/15000/params/ # Step 15000 model weights
checkpoints/19999/params/ # Step 19999 model weights (final)
TRAINING_LOG.md # Training log
W&B
Training curves: https://wandb.ai/pravsels/openpi-rlt-block-tower/runs/xanf5muf
Usage
import openpi.models.model as _model
import openpi.training.config as _config
config = _config.get_config("pi05_rlt_build_block_tower_6mix")
params = _model.restore_params("checkpoints/19999/params", restore_type=np.ndarray)
model = config.model.load(params)