pi0.5 Bin Pack - Reward Recap Fix (Mixed)

Fine-tuned pi0.5 checkpoint for coffee capsule bin packing, rerun after fixing reward-recap advantage-token placement and valid-indices persistence issues.

Experiment

Objective: rerun mixed reward recap after the recap fix and shorten training to 50k steps.
Advantage mode: mixed
Config name: pi05_bin_pack_coffee_capsules_recap_mixed
Target steps: 50,000

Dataset

9 LeRobot datasets (1 base + 8 dAgger rounds): bin_pick_pack_coffee_capsules plus dAgger rounds 1.0.0, 1.1.0, 1.2.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, and 1.7.0.

Published Checkpoints

checkpoints/40000/params
checkpoints/49999/params

Checkpoint Hashes

Verify with cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum.

Step	SHA-256
40000	`97b775a6f01c85b889d4da3ae5932eda16ef68e0c27d83fb582d90203195bb2c`
49999	`d2a71ebfac9957e5455537a4fc74ad4ccda66ee34747c11f1ea82f650c6e2096`

Repo Structure

assets/                      # Norm stats and related inference assets
checkpoints/<step>/params/   # Model weights (params only)
README.md                    # This file
TRAINING_LOG.md              # Sanitized training log

W&B

Training dashboard

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics