Pi0.6 Koch T-shirt Folding

Pi0.6 fine-tuned for Koch t-shirt folding (Gongsta-style task).

Model Description

Pi0.6 is a Vision-Language-Action (VLA) model for robot manipulation built on the FLA framework.

Architecture

  • Base Model: Pi0.5 (PaliGemma-based, ~2.7B parameters)
  • VL Backbone: Gemma 2B (frozen)
  • Action Expert: Gemma 300M (trained)
  • Vision Encoder: SigLIP (frozen)
  • Action Generation: Flow Matching

Training Strategy

Fine-tuned from Pi0.5 base on Koch t-shirt folding (RECAP/Gongsta-style) data.

  • Task: T-shirt folding manipulation
  • Training uses frozen VLM backbone with action expert training
  • Norm stats and assets included for inference

Usage

from fla.policies import policy_config
from fla.training import config

# Load the policy
config = policy_config.get_policy_config("pi06_koch_tshirt")
policy = config.create_policy()

# Run inference
action = policy(observation)

With FLA Server

# Start server
uv run scripts/serve_policy.py --env aloha_sim \
    policy:checkpoint --checkpoint_path hf://djkesu/pi06-koch-tshirt-5k

# In your robot controller
from openpi_client import OpenPIClient
client = OpenPIClient("http://localhost:8000")
action = client.infer(observation)

Training Details

  • Base: Pi0.5 pretrained checkpoint
  • Config: pi06_koch_tshirt
  • Checkpoint: 5,000 steps (training can be resumed)
  • Hardware: Lambda / H100

Benchmarks

Task Notes
Koch t-shirt folding Task-specific fine-tuning; checkpoint at 5k steps.

Citation

@article{pi0,
  title={$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control},
  author={Physical Intelligence},
  year={2024},
}

License

Apache 2.0

Downloads last month
-
Video Preview
loading