Pi0.6 Koch T-shirt Folding
Pi0.6 fine-tuned for Koch t-shirt folding (Gongsta-style task).
Model Description
Pi0.6 is a Vision-Language-Action (VLA) model for robot manipulation built on the FLA framework.
Architecture
- Base Model: Pi0.5 (PaliGemma-based, ~2.7B parameters)
- VL Backbone: Gemma 2B (frozen)
- Action Expert: Gemma 300M (trained)
- Vision Encoder: SigLIP (frozen)
- Action Generation: Flow Matching
Training Strategy
Fine-tuned from Pi0.5 base on Koch t-shirt folding (RECAP/Gongsta-style) data.
- Task: T-shirt folding manipulation
- Training uses frozen VLM backbone with action expert training
- Norm stats and assets included for inference
Usage
from fla.policies import policy_config
from fla.training import config
# Load the policy
config = policy_config.get_policy_config("pi06_koch_tshirt")
policy = config.create_policy()
# Run inference
action = policy(observation)
With FLA Server
# Start server
uv run scripts/serve_policy.py --env aloha_sim \
policy:checkpoint --checkpoint_path hf://djkesu/pi06-koch-tshirt-5k
# In your robot controller
from openpi_client import OpenPIClient
client = OpenPIClient("http://localhost:8000")
action = client.infer(observation)
Training Details
- Base: Pi0.5 pretrained checkpoint
- Config: pi06_koch_tshirt
- Checkpoint: 5,000 steps (training can be resumed)
- Hardware: Lambda / H100
Benchmarks
| Task | Notes |
|---|---|
| Koch t-shirt folding | Task-specific fine-tuning; checkpoint at 5k steps. |
Citation
@article{pi0,
title={$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control},
author={Physical Intelligence},
year={2024},
}
License
Apache 2.0
- Downloads last month
- -