pi0.5 Bin Pack - Reward Recap Fix (Mixed)
Fine-tuned pi0.5 checkpoint for coffee capsule bin packing, rerun after fixing reward-recap advantage-token placement and valid-indices persistence issues.
Experiment
- Objective: rerun mixed reward recap after the recap fix and shorten training to 50k steps.
- Advantage mode:
mixed - Config name:
pi05_bin_pack_coffee_capsules_recap_mixed - Target steps: 50,000
Dataset
9 LeRobot datasets (1 base + 8 dAgger rounds): bin_pick_pack_coffee_capsules plus dAgger rounds 1.0.0, 1.1.0, 1.2.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, and 1.7.0.
Published Checkpoints
checkpoints/40000/paramscheckpoints/49999/params
Checkpoint Hashes
Verify with cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum.
| Step | SHA-256 |
|---|---|
| 40000 | 97b775a6f01c85b889d4da3ae5932eda16ef68e0c27d83fb582d90203195bb2c |
| 49999 | d2a71ebfac9957e5455537a4fc74ad4ccda66ee34747c11f1ea82f650c6e2096 |
Repo Structure
assets/ # Norm stats and related inference assets
checkpoints/<step>/params/ # Model weights (params only)
README.md # This file
TRAINING_LOG.md # Sanitized training log