wsagi commited on
Commit
c8c3318
·
verified ·
1 Parent(s): b4b597a

Add files using upload-large-folder tool

Browse files
Files changed (2) hide show
  1. README.md +70 -55
  2. model.safetensors +1 -1
README.md CHANGED
@@ -18,8 +18,8 @@ base_model: lerobot/smolvla_base
18
 
19
  # SmolVLA-PickOrange
20
 
21
- 针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 30k step。
22
- _A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task, 30k steps full-parameter from `lerobot/smolvla_base`._
23
 
24
  ![SmolVLA-PickOrange — SO-101 in Isaac Sim](smolvla-pick-orange.jpg)
25
 
@@ -35,79 +35,94 @@ _A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on t
35
  - **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
36
  - **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范,30 fps,dual-cam 480×640。
37
  - **架构 / Architecture**:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,`chunk_size=50`。
38
- - **训练 / Training**:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 30k step / pyav video backend,~14h on RTX 4090
39
- - **评测 / Eval**(Isaac Sim 5.1,3 round × 3 颗 = 9 颗):
40
- - **strict 1/3 rounds,5/9 oranges**(partial credit by sticky `put_orange_to_plate`)
41
- - 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) 的 `LeIsaac/README.md` benchmark section
42
- - **⚠️ 推理 inference 配置**:
43
- - `policy_action_horizon=50`(= chunk_size,全 chunk receding window)
44
- - LeRobot async server 端 `--policy_checkpoint_path=wsagi/SmolVLA-PickOrange`
45
- - `step_hz=30` 匹配 dataset
46
-
47
- ## 模型亮点
48
- _Highlights_
49
-
50
- - SmolVLA 全参微调在 60 ep 小数据上**部分能学到**,1/3 round 自然 success(3/3 oranges in 158s)— 比第三方 [`edge-inference/smolvla-so101-pick-orange`](https://huggingface.co/edge-inference/smolvla-so101-pick-orange) 的 0/3 强。
51
- - 但 round 间方差大(episode 2 = 0/3,episode 3 = 2/3)— **60 ep × 30k step 仍欠拟合**。
52
- - 大参数 VLM-based policy 在低数据 regime 下不如专精 visuomotor (ACT 80M) — 与原 SmolVLA 论文低数据 finding 一致。
53
 
54
- ## 训练配方
55
- _Training recipe_
56
 
57
- | / Item | / Value |
58
- |---|---|
59
- | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
60
- | Policy | `smolvla` (LeRobot 实现) |
61
- | Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert |
62
- | `chunk_size` / `n_action_steps` | 50 / 50 |
63
- | Batch size | 8 (full-param, no LoRA) |
64
- | Optimizer | AdamW, lr=1e-4 |
65
- | Steps | 30000 (~14h on 4090) |
66
- | `video_backend` | `pyav`(torchcodec 长跑 segfault) |
67
- | Image augmentation | 无 |
68
- | Train expert only | False(全参数) |
69
 
70
- > **🚨 schema-free base 关键 fix**:训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`(默认 `camera1/2/3 @ 256×256` 会污染微调路径),否则训练时 schema 不对齐 forward KeyError silent 训坏。详见 [`smolvla2_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/finetune/smolvla2_finetune_pick_orange.html)
71
 
72
- ## 推理 inference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
- ### 通过 LeIsaac eval harness 跑( / recommended)
75
 
76
  ```bash
77
  # 1. 启 LeRobot async policy server
78
  bash server/start_server.sh --lerobot-only
79
 
80
  # 2. 跑 LeIsaac PickOrange eval
81
- DISPLAY=:0 python -u LeIsaac/scripts/evaluation/policy_inference.py \
 
 
 
 
82
  --task=LeIsaac-SO101-PickOrange-v0 \
83
- --eval_rounds=3 --episode_length_s=120 --step_hz=30 \
84
  --policy_type=lerobot-smolvla \
85
- --policy_host=127.0.0.1 --policy_port=8080 \
86
- --policy_action_horizon=50 \
87
- --policy_checkpoint_path=wsagi/SmolVLA-PickOrange \
88
- --policy_language_instruction='Pick up the orange and place it on the plate' \
 
 
89
  --device=cuda --enable_cameras
90
  ```
91
 
92
- ### 直接用 LeRobot
 
 
93
 
94
- ```python
95
- from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
96
- policy = SmolVLAPolicy.from_pretrained("wsagi/SmolVLA-PickOrange")
97
- # 见 LeRobot 文档
98
- ```
99
 
100
- ## 评测细节(Isaac Sim 5.1,2026-05-18 snapshot)
101
- _Evaluation details_
 
 
 
 
 
 
 
 
 
 
102
 
103
- | Round | 🍊 placed | duration | mode | notes |
104
- |---|---|---|---|---|
105
- | 1 | 3/3 ✅ | 158.2 s | env-success | 自然完成 |
106
- | 2 | 0/3 | 551.7 s | key-R skip | 抓不中颤抖 |
107
- | 3 | 2/3 | 355.0 s | manual-hang | lerobot server 中断;2 是 viewport 观察 |
108
 
109
- **round-by-round detail + 1Hz GPU sample + 7-baseline 横评对比** 见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) `results/benchmark/snapshots/`。
 
 
 
 
 
 
110
 
111
  ## License
112
 
113
- Apache-2.0(继承自 `lerobot/smolvla_base` 和 LeIsaac)。
 
18
 
19
  # SmolVLA-PickOrange
20
 
21
+ 针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 **15k step**(main,sweep best)
22
+ _A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. `main` = **step-15000 (sweep best)**, full-parameter from `lerobot/smolvla_base`._
23
 
24
  ![SmolVLA-PickOrange — SO-101 in Isaac Sim](smolvla-pick-orange.jpg)
25
 
 
35
  - **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
36
  - **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范,30 fps,dual-cam 480×640。
37
  - **架构 / Architecture**:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,`chunk_size=50`。
38
+ - **训练 / Training**:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 30k step 训练,30k 后明显过拟合。**main = step-15000 (sweep best)**
39
+ - **评测 / Eval**(Isaac Sim 5.1,**5 round × 3 颗 = 15**,post-fix placement check):
40
+ - **2/5 strict rounds, 8/15 oranges (53%), 133s avg** 15k @ h=50
41
+ - 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) README leaderboard
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ ## Checkpoint branches / ckpt 分支
 
44
 
45
+ | Branch | Step | env rounds | oranges | avg s | 备注 |
46
+ |---|---|---|---|---|---|
47
+ | **`main`** | **15000** | **2/5** | **8/15 (53%)** | **133s** | sweep best |
48
+ | `ckpt-20k` | 20000 | 0/5 | 6/15 (40%) | 180s | (will be uploaded if needed) |
49
+ | `ckpt-25k` | 25000 | 1/5 | 5/15 (33%) | 160s | (will be uploaded if needed) |
50
+ | `ckpt-30k` | 30000 | 0/5 | 4/15 (27%) | 180s | overfit; 旧 main 已搬到此分支 |
 
 
 
 
 
 
51
 
52
+ _Sweep h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1,单一 RTX 4090_
53
 
54
+ ## Ckpt sweep 曲线 / Ckpt sweep curve
55
+
56
+ 15k 是最佳点:训得久了开始 overfit 60 ep 这个小数据集,过早(10k 以下)尚未学到完整 pick-place-pick-place 长程序列。
57
+
58
+ ```
59
+ oranges/15
60
+ 9 |
61
+ 8 | ⭐ 15k
62
+ 7 |
63
+ 6 | ● 20k
64
+ 5 | ● 25k
65
+ 4 | ● 30k
66
+ 3 |
67
+ 2 |
68
+ 1 |● 10k
69
+ 0 +----------------------
70
+ 10 15 20 25 30 k step
71
+ ```
72
 
73
+ ## 推 inference 配置
74
 
75
  ```bash
76
  # 1. 启 LeRobot async policy server
77
  bash server/start_server.sh --lerobot-only
78
 
79
  # 2. 跑 LeIsaac PickOrange eval
80
+ POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \
81
+ ACTION_HORIZON=50 \
82
+ EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
83
+ PROMPT="Pick up the orange and put it in the plate" \
84
+ conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \
85
  --task=LeIsaac-SO101-PickOrange-v0 \
 
86
  --policy_type=lerobot-smolvla \
87
+ --policy_port=8080 \
88
+ --policy_checkpoint_path=$POLICY_CHECKPOINT \
89
+ --policy_action_horizon=$ACTION_HORIZON \
90
+ --eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \
91
+ --max_round_wall_s=$MAX_ROUND_WALL_S \
92
+ --policy_language_instruction="$PROMPT" \
93
  --device=cuda --enable_cameras
94
  ```
95
 
96
+ **关键 inference 参数 (per [scripts/benchmark/baselines_action_horizon.tsv](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/baselines_action_horizon.tsv))**:
97
+ - `action_horizon=50`(= train chunk_size,h=40 实测略弱)
98
+ - 选 branch `main` 拿 best;或 `ckpt-30k` / 任何 `ckpt-Nk` 拿对应阶段。
99
 
100
+ ## 训练配方
101
+ _Training recipe_
 
 
 
102
 
103
+ | / Item | 值 / Value |
104
+ |---|---|
105
+ | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
106
+ | Policy | `smolvla` (LeRobot 实现) |
107
+ | Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert |
108
+ | `chunk_size` / `n_action_steps` | 50 / 50 |
109
+ | Batch size | 8 (full-param, no LoRA) |
110
+ | Optimizer | AdamW, lr=1e-4 |
111
+ | Steps | 30000 (~14h on 4090) → main = **15000** (sweep best) |
112
+ | `video_backend` | `pyav`(torchcodec 长跑 segfault) |
113
+ | Image augmentation | 无 |
114
+ | Train expert only | False(全参数) |
115
 
116
+ > **🚨 schema-free base 关键 fix**:训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`(默认 `camera1/2/3 @ 256×256` 会污染微调路径),否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。
 
 
 
 
117
 
118
+ ## Eval 历史 / Eval history
119
+
120
+ | 版本 | env rounds | oranges | avg s | 备注 |
121
+ |---|---|---|---|---|
122
+ | 30k h=50 (旧 leaderboard) | 1/3 | 5/9 (55%) | 355s | sticky-OR + 3-round(旧 buggy 计数) |
123
+ | **30k h=50 (post-fix 5-round)** | 0/5 | 4/15 (27%) | 180s | 真实 5-round + pre-step snapshot |
124
+ | **15k h=50 (post-fix 5-round)** | **2/5** | **8/15 (53%)** | **133s** | **sweep best, 现 main** ⭐ |
125
 
126
  ## License
127
 
128
+ Apache-2.0(继承自 `lerobot/smolvla_base`)。
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5180ce422a593d2e65c668782aa23069e4bbf6151758e08ccb6d0e0a16a3c587
3
  size 906712520
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1c42010653754d28cbafbcff2bbecee49aa80401935bcd1a2734dcb9d776901
3
  size 906712520