pi0.5 Build Block Tower Subtask Reward Recap - Positive Only

Fine-tuned pi0.5 checkpoint for block tower building using positive-only reward recap semantics with advantage conditioning, dropout, and hierarchical subtask loss.

Experiment

  • Config name: pi05_build_block_tower_subtask_recap_positive_only
  • Run type: experiment
  • Objective: train block tower recap with positive-only advantage conditioning and hierarchical subtask prompting (subtask_loss_weight=1.0) to compare against flat-prompt and mixed variants

Dataset

  • 6 HuggingFace datasets: villekuosmanen/build_block_tower plus 5 DAgger rounds (1.0.0 through 1.4.0)

Uploaded Checkpoints

  • 22000: final checkpoint (training stopped at 22k/50k steps), SHA-256 0680c2a5db6bac2771b4bf39cd2c9769aa5575905b45c775f88db77083120a10

Checkpoints are stored as params-only artifacts under checkpoints/<step>/params/.

To verify integrity after download:

cd checkpoints/22000 && find params -type f | sort | xargs sha256sum | sha256sum

Assets

  • assets/ contains normalization stats and dataset metadata used by this run.

W&B

Repo Structure

checkpoints/22000/params/
checkpoints/22000/assets/
README.md
TRAINING_LOG.md
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading