PI0 Hanoi Policy Gradient Checkpoint (30k steps)

This is a checkpoint for the PI0 (Physical Intelligence 0) model trained on the Hanoi task using a subtask-based approach with policy gradient methods.

This model was presented in the paper The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption.

Project Page: https://price-is-not-right.github.io
GitHub Repository: https://github.com/price-is-not-right/price-is-not-right

Model Details

Task: Hanoi Tower puzzle (subtask decomposition)
Training Steps: 30,000
Model Type: Policy gradient with subtask learning
Framework: JAX/Flax
Dataset: hanoi_300_lerobot
Architecture: Vision-Language-Action model with subtask masking

Key Features

Subtask Learning: Decomposes Hanoi puzzle into manageable subtasks
No Masking: Trained without subtask masking for better generalization across subtask boundaries
Policy Gradient: Uses policy gradient optimization
End-to-End: Learns from visual observations to actions

Checkpoint Structure

params/: Model parameters
train_state/: Training state
assets/: Additional assets including normalization statistics
_CHECKPOINT_METADATA: Checkpoint metadata

Usage

This checkpoint can be loaded and evaluated using the openpi framework. Detailed instructions for running experiments in Robosuite via Docker can be found in the official GitHub repository.

Training Configuration

Dataset: hanoi_300_lerobot
Training approach: Subtask-based policy gradient
Subtask masking: Disabled (no_masking)
Total training steps: 30,000
Model: PI0 with subtask decomposition

Technical Details

Subtask Approach: Decomposes Hanoi puzzle into logical subtasks
No Masking: Allows model to learn across subtask boundaries
Policy Gradient: Direct policy optimization without value function
Vision Input: Processes visual observations from robot cameras (agentview and eye-in-hand)
Action Output: Generates robot manipulation actions

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Collection including tduggan93/pi0-hanoi-planner-guided

Price is not Right - ICRA 2026

Collection

Data and models for The Price is not Right ICRA paper • 3 items • Updated Feb 18

Paper for tduggan93/pi0-hanoi-planner-guided

The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption

Paper • 2602.19260 • Published Feb 22

Evaluation results

Success Rate on Hanoi 300 LeRobot Dataset
self-reported

0.000