pi0.5 Bin Pack - Reward Recap Fix (Mixed)

Fine-tuned pi0.5 checkpoint for coffee capsule bin packing, rerun after fixing reward-recap advantage-token placement and valid-indices persistence issues.

Experiment

  • Objective: rerun mixed reward recap after the recap fix and shorten training to 50k steps.
  • Advantage mode: mixed
  • Config name: pi05_bin_pack_coffee_capsules_recap_mixed
  • Target steps: 50,000

Dataset

9 LeRobot datasets (1 base + 8 dAgger rounds): bin_pick_pack_coffee_capsules plus dAgger rounds 1.0.0, 1.1.0, 1.2.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, and 1.7.0.

Published Checkpoints

  • checkpoints/40000/params
  • checkpoints/49999/params

Checkpoint Hashes

Verify with cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum.

Step SHA-256
40000 97b775a6f01c85b889d4da3ae5932eda16ef68e0c27d83fb582d90203195bb2c
49999 d2a71ebfac9957e5455537a4fc74ad4ccda66ee34747c11f1ea82f650c6e2096

Repo Structure

assets/                      # Norm stats and related inference assets
checkpoints/<step>/params/   # Model weights (params only)
README.md                    # This file
TRAINING_LOG.md              # Sanitized training log

W&B

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading