PI0 Hanoi Policy Gradient Checkpoint (30k steps)

This is a checkpoint for the PI0 (Physical Intelligence 0) model trained on the Hanoi task using a subtask-based approach with policy gradient methods.

This model was presented in the paper The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption.

Model Details

  • Task: Hanoi Tower puzzle (subtask decomposition)
  • Training Steps: 30,000
  • Model Type: Policy gradient with subtask learning
  • Framework: JAX/Flax
  • Dataset: hanoi_300_lerobot
  • Architecture: Vision-Language-Action model with subtask masking

Key Features

  • Subtask Learning: Decomposes Hanoi puzzle into manageable subtasks
  • No Masking: Trained without subtask masking for better generalization across subtask boundaries
  • Policy Gradient: Uses policy gradient optimization
  • End-to-End: Learns from visual observations to actions

Checkpoint Structure

  • params/: Model parameters
  • train_state/: Training state
  • assets/: Additional assets including normalization statistics
  • _CHECKPOINT_METADATA: Checkpoint metadata

Usage

This checkpoint can be loaded and evaluated using the openpi framework. Detailed instructions for running experiments in Robosuite via Docker can be found in the official GitHub repository.

Training Configuration

  • Dataset: hanoi_300_lerobot
  • Training approach: Subtask-based policy gradient
  • Subtask masking: Disabled (no_masking)
  • Total training steps: 30,000
  • Model: PI0 with subtask decomposition

Technical Details

  • Subtask Approach: Decomposes Hanoi puzzle into logical subtasks
  • No Masking: Allows model to learn across subtask boundaries
  • Policy Gradient: Direct policy optimization without value function
  • Vision Input: Processes visual observations from robot cameras (agentview and eye-in-hand)
  • Action Output: Generates robot manipulation actions
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Collection including tduggan93/pi0-hanoi-planner-guided

Paper for tduggan93/pi0-hanoi-planner-guided

Evaluation results