wsagi
/

SmolVLA-PickOrange

@@ -18,8 +18,8 @@ base_model: lerobot/smolvla_base
 # SmolVLA-PickOrange
-针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 30k step。
-_A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task, 30k steps full-parameter from `lerobot/smolvla_base`._
 ![SmolVLA-PickOrange — SO-101 in Isaac Sim](smolvla-pick-orange.jpg)
@@ -35,79 +35,94 @@ _A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on t
 - **任务 / Task**：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
 - **数据集 / Dataset**：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范，30 fps，dual-cam 480×640。
 - **架构 / Architecture**：SmolVLA v1（450M），SmolVLM2-500M-Video-Instruct backbone + Action Expert，`chunk_size=50`。
-- **训练 / Training**：full-param 微调（无 LoRA），batch=8 / lr=1e-4 / 30k step / pyav video backend，~14h on RTX 4090。
-- **评测 / Eval**（Isaac Sim 5.1，3 round × 3 颗 = 9 颗）：
-  - **strict 1/3 rounds，5/9 oranges**（partial credit by sticky `put_orange_to_plate`）
-  - 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) 的 `LeIsaac/README.md` benchmark section
-- **⚠️ 推理 inference 配置**：
-  - `policy_action_horizon=50`（= chunk_size，全 chunk receding window）
-  - LeRobot async server 端 `--policy_checkpoint_path=wsagi/SmolVLA-PickOrange`
-  - `step_hz=30` 匹配 dataset
-## 模型亮点
-_Highlights_
-- SmolVLA 全参微调在 60 ep 小数据上**部分能学到**，1/3 round 自然 success（3/3 oranges in 158s）— 比第三方 [`edge-inference/smolvla-so101-pick-orange`](https://huggingface.co/edge-inference/smolvla-so101-pick-orange) 的 0/3 强。
-- 但 round 间方差大（episode 2 = 0/3，episode 3 = 2/3）— **60 ep × 30k step 仍欠拟合**。
-- 大参数 VLM-based policy 在低数据 regime 下不如专精 visuomotor (ACT 80M) — 与原 SmolVLA 论文低数据 finding 一致。
-## 训练配方
-_Training recipe_
-| 项 / Item | 值 / Value |
-|---|---|
-| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
-| Policy | `smolvla` (LeRobot 实现) |
-| Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert |
-| `chunk_size` / `n_action_steps` | 50 / 50 |
-| Batch size | 8 (full-param, no LoRA) |
-| Optimizer | AdamW, lr=1e-4 |
-| Steps | 30000 (~14h on 4090) |
-| `video_backend` | `pyav`（torchcodec 长跑 segfault） |
-| Image augmentation | 无 |
-| Train expert only | False（全参数） |
-> **🚨 schema-free base 关键 fix**：训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`（默认 `camera1/2/3 @ 256×256` 会污染微调路径），否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。详见 [`smolvla2_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/finetune/smolvla2_finetune_pick_orange.html)。
-## 推理 inference
-### 通过 LeIsaac eval harness 跑（推荐 / recommended）
 ```bash
 # 1. 启 LeRobot async policy server
 bash server/start_server.sh --lerobot-only
 # 2. 跑 LeIsaac PickOrange eval
-DISPLAY=:0 python -u LeIsaac/scripts/evaluation/policy_inference.py \
     --task=LeIsaac-SO101-PickOrange-v0 \
-    --eval_rounds=3 --episode_length_s=120 --step_hz=30 \
     --policy_type=lerobot-smolvla \
-    --policy_host=127.0.0.1 --policy_port=8080 \
-    --policy_action_horizon=50 \
-    --policy_checkpoint_path=wsagi/SmolVLA-PickOrange \
-    --policy_language_instruction='Pick up the orange and place it on the plate' \
     --device=cuda --enable_cameras
 ```
-### 直接用 LeRobot
-```python
-from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
-policy = SmolVLAPolicy.from_pretrained("wsagi/SmolVLA-PickOrange")
-# 见 LeRobot 文档
-```
-## 评测细节（Isaac Sim 5.1，2026-05-18 snapshot）
-_Evaluation details_
-| Round | 🍊 placed | duration | mode | notes |
-|---|---|---|---|---|
-| 1 | 3/3 ✅ | 158.2 s | env-success | 自然完成 |
-| 2 | 0/3 | 551.7 s | key-R skip | 抓不中颤抖 |
-| 3 | 2/3 | 355.0 s | manual-hang | lerobot server 中断；2 是 viewport 观察 |
-**round-by-round detail + 1Hz GPU sample + 7-baseline 横评对比** 见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) 的 `results/benchmark/snapshots/`。
 ## License
-Apache-2.0（继承自 `lerobot/smolvla_base` 和 LeIsaac）。

 # SmolVLA-PickOrange
+针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 **15k step**（main，sweep best）。
+_A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. `main` = **step-15000 (sweep best)**, full-parameter from `lerobot/smolvla_base`._
 ![SmolVLA-PickOrange — SO-101 in Isaac Sim](smolvla-pick-orange.jpg)
 - **任务 / Task**：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
 - **数据集 / Dataset**：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范，30 fps，dual-cam 480×640。
 - **架构 / Architecture**：SmolVLA v1（450M），SmolVLM2-500M-Video-Instruct backbone + Action Expert，`chunk_size=50`。
+- **训练 / Training**：full-param 微调（无 LoRA），batch=8 / lr=1e-4 / 总 30k step 训练，30k 后明显过拟合。**main = step-15000 (sweep best)**。
+- **评测 / Eval**（Isaac Sim 5.1，**5 round × 3 颗 = 15 颗**，post-fix placement check）：
+  - **2/5 strict rounds, 8/15 oranges (53%), 133s avg** ← 15k @ h=50
+  - 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) README leaderboard
+## Checkpoint branches / ckpt 分支
+| Branch | Step | env rounds | oranges | avg s | 备注 |
+|---|---|---|---|---|---|
+| **`main`** | **15000** | **2/5** | **8/15 (53%)** | **133s** | sweep best ⭐ |
+| `ckpt-20k` | 20000 | 0/5 | 6/15 (40%) | 180s | (will be uploaded if needed) |
+| `ckpt-25k` | 25000 | 1/5 | 5/15 (33%) | 160s | (will be uploaded if needed) |
+| `ckpt-30k` | 30000 | 0/5 | 4/15 (27%) | 180s | overfit; 旧 main 已搬到此分支 |
+_Sweep 用 h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1，单一 RTX 4090。_
+## Ckpt sweep 曲线 / Ckpt sweep curve
+15k 是最佳点：训得久了开始 overfit 60 ep 这个小数据集，过早（10k 以下）尚未学到完整 pick-place-pick-place 长程序列。
+```
+oranges/15
+  9 |
+  8 |          ⭐ 15k
+  7 |
+  6 |       ●  20k
+  5 |          ●  25k
+  4 |             ●  30k
+  3 |
+  2 |
+  1 |●  10k
+  0 +----------------------
+    10  15  20  25  30  k step
+```
+## 推理 inference 配置
 ```bash
 # 1. 启 LeRobot async policy server
 bash server/start_server.sh --lerobot-only
 # 2. 跑 LeIsaac PickOrange eval
+POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \
+ACTION_HORIZON=50 \
+EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
+PROMPT="Pick up the orange and put it in the plate" \
+conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \
     --task=LeIsaac-SO101-PickOrange-v0 \
     --policy_type=lerobot-smolvla \
+    --policy_port=8080 \
+    --policy_checkpoint_path=$POLICY_CHECKPOINT \
+    --policy_action_horizon=$ACTION_HORIZON \
+    --eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \
+    --max_round_wall_s=$MAX_ROUND_WALL_S \
+    --policy_language_instruction="$PROMPT" \
     --device=cuda --enable_cameras
 ```
+**关键 inference 参数 (per [scripts/benchmark/baselines_action_horizon.tsv](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/baselines_action_horizon.tsv))**：
+- `action_horizon=50`（= train chunk_size，h=40 实测略弱）
+- 选 branch `main` 拿 best；或 `ckpt-30k` / 任何 `ckpt-Nk` 拿对应阶段。
+## 训练配方
+_Training recipe_
+| 项 / Item | 值 / Value |
+|---|---|
+| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
+| Policy | `smolvla` (LeRobot 实现) |
+| Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert |
+| `chunk_size` / `n_action_steps` | 50 / 50 |
+| Batch size | 8 (full-param, no LoRA) |
+| Optimizer | AdamW, lr=1e-4 |
+| Steps | 30000 (~14h on 4090) → main = **15000** (sweep best) |
+| `video_backend` | `pyav`（torchcodec 长跑 segfault） |
+| Image augmentation | 无 |
+| Train expert only | False（全参数） |
+> **🚨 schema-free base 关键 fix**：训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`（默认 `camera1/2/3 @ 256×256` 会污染微调路径），否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。
+## Eval 历史 / Eval history
+| 版本 | env rounds | oranges | avg s | 备注 |
+|---|---|---|---|---|
+| 30k h=50 (旧 leaderboard) | 1/3 | 5/9 (55%) | 355s | sticky-OR + 3-round（旧 buggy 计数） |
+| **30k h=50 (post-fix 5-round)** | 0/5 | 4/15 (27%) | 180s | 真实 5-round + pre-step snapshot |
+| **15k h=50 (post-fix 5-round)** | **2/5** | **8/15 (53%)** | **133s** | **sweep best, 现 main** ⭐ |
 ## License
+Apache-2.0（继承自 `lerobot/smolvla_base`）。

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5180ce422a593d2e65c668782aa23069e4bbf6151758e08ccb6d0e0a16a3c587
 size 906712520

 version https://git-lfs.github.com/spec/v1
+oid sha256:d1c42010653754d28cbafbcff2bbecee49aa80401935bcd1a2734dcb9d776901
 size 906712520