File size: 5,858 Bytes
186b5a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
edb1d87
186b5a1
 
 
 
b4b597a
 
186b5a1
 
 
 
b4b597a
 
186b5a1
 
 
 
 
 
 
 
 
 
 
 
edb1d87
186b5a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
edb1d87
186b5a1
 
 
 
 
 
 
 
edb1d87
186b5a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
tags:
  - smolvla
  - lerobot
  - so101
  - leisaac
  - pick-orange
  - isaac-sim
datasets:
  - LightwheelAI/leisaac-pick-orange
language:
  - en
base_model: lerobot/smolvla_base
---

# SmolVLA-PickOrange

针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 30k step。
_A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task, 30k steps full-parameter from `lerobot/smolvla_base`._

![SmolVLA-PickOrange — SO-101 in Isaac Sim](smolvla-pick-orange.jpg)

**🔗 项目仓库 / Project repos**- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)— 含 7-baseline benchmark
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)

> **关于命名 / About the name**:`config.type=smolvla` (LeRobot v1 SmolVLA implementation),backbone 用 `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` (SmolVLM**2**)。LeRobot 自己也叫 `smolvla` 而不是 `smolvla2`,所以仓库名沿用 `SmolVLA-PickOrange`。
> _`config.type=smolvla` (LeRobot v1 SmolVLA implementation) with `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` backbone. LeRobot keeps the policy name `smolvla` (matching their naming), so this repo follows suit._

## TL;DR

- **任务 / Task**`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范,30 fps,dual-cam 480×640。
- **架构 / Architecture**:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,`chunk_size=50`- **训练 / Training**:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 30k step / pyav video backend,~14h on RTX 4090。
- **评测 / Eval**(Isaac Sim 5.1,3 round × 3 颗 = 9 颗):
  - **strict 1/3 rounds,5/9 oranges**(partial credit by sticky `put_orange_to_plate`  - 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) 的 `LeIsaac/README.md` benchmark section
- **⚠️ 推理 inference 配置**  - `policy_action_horizon=50`(= chunk_size,全 chunk receding window)
  - LeRobot async server 端 `--policy_checkpoint_path=wsagi/SmolVLA-PickOrange`
  - `step_hz=30` 匹配 dataset

## 模型亮点
_Highlights_

- SmolVLA 全参微调在 60 ep 小数据上**部分能学到**,1/3 round 自然 success(3/3 oranges in 158s)— 比第三方 [`edge-inference/smolvla-so101-pick-orange`](https://huggingface.co/edge-inference/smolvla-so101-pick-orange) 的 0/3 强。
- 但 round 间方差大(episode 2 = 0/3,episode 3 = 2/3)— **60 ep × 30k step 仍欠拟合**- 大参数 VLM-based policy 在低数据 regime 下不如专精 visuomotor (ACT 80M) — 与原 SmolVLA 论文低数据 finding 一致。

## 训练配方
_Training recipe_

| 项 / Item | 值 / Value |
|---|---|
| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
| Policy | `smolvla` (LeRobot 实现) |
| Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert |
| `chunk_size` / `n_action_steps` | 50 / 50 |
| Batch size | 8 (full-param, no LoRA) |
| Optimizer | AdamW, lr=1e-4 |
| Steps | 30000 (~14h on 4090) |
| `video_backend` | `pyav`(torchcodec 长跑 segfault) |
| Image augmentation | 无 |
| Train expert only | False(全参数) |

> **🚨 schema-free base 关键 fix**:训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`(默认 `camera1/2/3 @ 256×256` 会污染微调路径),否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。详见 [`smolvla2_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/finetune/smolvla2_finetune_pick_orange.html)。

## 推理 inference

### 通过 LeIsaac eval harness 跑(推荐 / recommended)

```bash
# 1. 启 LeRobot async policy server
bash server/start_server.sh --lerobot-only

# 2. 跑 LeIsaac PickOrange eval
DISPLAY=:0 python -u LeIsaac/scripts/evaluation/policy_inference.py \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=3 --episode_length_s=120 --step_hz=30 \
    --policy_type=lerobot-smolvla \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_action_horizon=50 \
    --policy_checkpoint_path=wsagi/SmolVLA-PickOrange \
    --policy_language_instruction='Pick up the orange and place it on the plate' \
    --device=cuda --enable_cameras
```

### 直接用 LeRobot

```python
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("wsagi/SmolVLA-PickOrange")
# 见 LeRobot 文档
```

## 评测细节(Isaac Sim 5.1,2026-05-18 snapshot)
_Evaluation details_

| Round | 🍊 placed | duration | mode | notes |
|---|---|---|---|---|
| 1 | 3/3 ✅ | 158.2 s | env-success | 自然完成 |
| 2 | 0/3 | 551.7 s | key-R skip | 抓不中颤抖 |
| 3 | 2/3 | 355.0 s | manual-hang | lerobot server 中断;2 是 viewport 观察 |

**round-by-round detail + 1Hz GPU sample + 7-baseline 横评对比** 见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) 的 `results/benchmark/snapshots/`## License

Apache-2.0(继承自 `lerobot/smolvla_base` 和 LeIsaac)。