Add files using upload-large-folder tool

c8c3318 verified about 2 hours ago

6.15 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- smolvla
	- lerobot
	- so101
	- leisaac
	- pick-orange
	- isaac-sim
	datasets:
	- LightwheelAI/leisaac-pick-orange
	language:
	- en
	base_model: lerobot/smolvla_base
	---

	# SmolVLA-PickOrange

	针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 15k step（main，sweep best）。
	_A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. `main` = step-15000 (sweep best), full-parameter from `lerobot/smolvla_base`._

	![SmolVLA-PickOrange — SO-101 in Isaac Sim](smolvla-pick-orange.jpg)

	🔗 项目仓库 / Project repos：
	- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）— 含 7-baseline benchmark
	- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

	> 关于命名 / About the name：`config.type=smolvla` (LeRobot v1 SmolVLA implementation)，backbone 用 `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` (SmolVLM2)。LeRobot 自己也叫 `smolvla` 而不是 `smolvla2`，所以仓库名沿用 `SmolVLA-PickOrange`。
	> _`config.type=smolvla` (LeRobot v1 SmolVLA implementation) with `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` backbone. LeRobot keeps the policy name `smolvla` (matching their naming), so this repo follows suit._

	## TL;DR

	- 任务 / Task：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
	- 数据集 / Dataset：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范，30 fps，dual-cam 480×640。
	- 架构 / Architecture：SmolVLA v1（450M），SmolVLM2-500M-Video-Instruct backbone + Action Expert，`chunk_size=50`。
	- 训练 / Training：full-param 微调（无 LoRA），batch=8 / lr=1e-4 / 总 30k step 训练，30k 后明显过拟合。main = step-15000 (sweep best)。
	- 评测 / Eval（Isaac Sim 5.1，5 round × 3 颗 = 15 颗，post-fix placement check）：
	- 2/5 strict rounds, 8/15 oranges (53%), 133s avg ← 15k @ h=50
	- 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) README leaderboard

	## Checkpoint branches / ckpt 分支

	\| Branch \| Step \| env rounds \| oranges \| avg s \| 备注 \|
	\|---\|---\|---\|---\|---\|---\|
	\| `main` \| 15000 \| 2/5 \| 8/15 (53%) \| 133s \| sweep best ⭐ \|
	\| `ckpt-20k` \| 20000 \| 0/5 \| 6/15 (40%) \| 180s \| (will be uploaded if needed) \|
	\| `ckpt-25k` \| 25000 \| 1/5 \| 5/15 (33%) \| 160s \| (will be uploaded if needed) \|
	\| `ckpt-30k` \| 30000 \| 0/5 \| 4/15 (27%) \| 180s \| overfit; 旧 main 已搬到此分支 \|

	_Sweep 用 h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1，单一 RTX 4090。_

	## Ckpt sweep 曲线 / Ckpt sweep curve

	15k 是最佳点：训得久了开始 overfit 60 ep 这个小数据集，过早（10k 以下）尚未学到完整 pick-place-pick-place 长程序列。

	```
	oranges/15
	9 \|
	8 \| ⭐ 15k
	7 \|
	6 \| ● 20k
	5 \| ● 25k
	4 \| ● 30k
	3 \|
	2 \|
	1 \|● 10k
	0 +----------------------
	10 15 20 25 30 k step
	```

	## 推理 inference 配置

	```bash
	# 1. 启 LeRobot async policy server
	bash server/start_server.sh --lerobot-only

	# 2. 跑 LeIsaac PickOrange eval
	POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \
	ACTION_HORIZON=50 \
	EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
	PROMPT="Pick up the orange and put it in the plate" \
	conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \
	--task=LeIsaac-SO101-PickOrange-v0 \
	--policy_type=lerobot-smolvla \
	--policy_port=8080 \
	--policy_checkpoint_path=$POLICY_CHECKPOINT \
	--policy_action_horizon=$ACTION_HORIZON \
	--eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \
	--max_round_wall_s=$MAX_ROUND_WALL_S \
	--policy_language_instruction="$PROMPT" \
	--device=cuda --enable_cameras
	```

	关键 inference 参数 (per [scripts/benchmark/baselines_action_horizon.tsv](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/baselines_action_horizon.tsv))：
	- `action_horizon=50`（= train chunk_size，h=40 实测略弱）
	- 选 branch `main` 拿 best；或 `ckpt-30k` / 任何 `ckpt-Nk` 拿对应阶段。

	## 训练配方
	_Training recipe_

	\| 项 / Item \| 值 / Value \|
	\|---\|---\|
	\| Dataset \| `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) \|
	\| Policy \| `smolvla` (LeRobot 实现) \|
	\| Backbone \| `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert \|
	\| `chunk_size` / `n_action_steps` \| 50 / 50 \|
	\| Batch size \| 8 (full-param, no LoRA) \|
	\| Optimizer \| AdamW, lr=1e-4 \|
	\| Steps \| 30000 (~14h on 4090) → main = 15000 (sweep best) \|
	\| `video_backend` \| `pyav`（torchcodec 长跑 segfault） \|
	\| Image augmentation \| 无 \|
	\| Train expert only \| False（全参数） \|

	> 🚨 schema-free base 关键 fix：训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`（默认 `camera1/2/3 @ 256×256` 会污染微调路径），否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。

	## Eval 历史 / Eval history

	\| 版本 \| env rounds \| oranges \| avg s \| 备注 \|
	\|---\|---\|---\|---\|---\|
	\| 30k h=50 (旧 leaderboard) \| 1/3 \| 5/9 (55%) \| 355s \| sticky-OR + 3-round（旧 buggy 计数） \|
	\| 30k h=50 (post-fix 5-round) \| 0/5 \| 4/15 (27%) \| 180s \| 真实 5-round + pre-step snapshot \|
	\| 15k h=50 (post-fix 5-round) \| 2/5 \| 8/15 (53%) \| 133s \| sweep best, 现 main ⭐ \|

	## License

	Apache-2.0（继承自 `lerobot/smolvla_base`）。