SFT-Robo2
Collection
6 items • Updated
OpenVLA-OFT SFT checkpoint for the beat_block_hammer task from RoboTwin 2.0, trained to match the SimpleVLA-RL paper (arXiv:2509.09674) settings.
| Parameter | Value |
|---|---|
| max_steps | 10,000 |
| batch_size | 8 per GPU x 2 GPUs = 16 global |
| learning_rate | 5e-4 |
| lora_rank | 32 |
| num_images_in_input | 1 (head camera only) |
| use_l1_regression | False |
| use_film | False |
| use_proprio | True |
| use_diffusion | False |
| image_aug | True |
| NUM_ACTIONS_CHUNK | 25 |
| ACTION_DIM | 14 |
| ACTION_PROPRIO_NORMALIZATION_TYPE | bounds |
| Metric | Value |
|---|---|
| Success rate (seed 0, 100 episodes) | 45.0% |
| Success rate (seed 1, partial 16 episodes) | 50.0% |
| Paper SFT baseline (Table 4) | 28.1% |
| Prior Phase 1 reproduction | 35.2% |
Evaluated with greedy sampling (do_sample=False), 100 held-out scenarios, demo_randomized config.
This checkpoint is compatible with:
Important: This checkpoint uses cross-entropy discrete action tokens (LLaMA2 head), NOT L1 regression with MLP action head. Ensure your inference code passes use_l1_regression=False and passes proprioceptive state to predict_action.
If you use this checkpoint, please cite the SimpleVLA-RL paper: arXiv:2509.09674
MIT