Instructions to use wsagi/SmolVLA-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/SmolVLA-PickOrange with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=wsagi/SmolVLA-PickOrange \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=wsagi/SmolVLA-PickOrange - Notebooks
- Google Colab
- Kaggle
Add files using upload-large-folder tool
Browse files- README.md +70 -55
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -18,8 +18,8 @@ base_model: lerobot/smolvla_base
|
|
| 18 |
|
| 19 |
# SmolVLA-PickOrange
|
| 20 |
|
| 21 |
-
针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训
|
| 22 |
-
_A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task
|
| 23 |
|
| 24 |

|
| 25 |
|
|
@@ -35,79 +35,94 @@ _A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on t
|
|
| 35 |
- **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
|
| 36 |
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范,30 fps,dual-cam 480×640。
|
| 37 |
- **架构 / Architecture**:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,`chunk_size=50`。
|
| 38 |
-
- **训练 / Training**:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 30k step
|
| 39 |
-
- **评测 / Eval**(Isaac Sim 5.1,
|
| 40 |
-
- **
|
| 41 |
-
- 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience)
|
| 42 |
-
- **⚠️ 推理 inference 配置**:
|
| 43 |
-
- `policy_action_horizon=50`(= chunk_size,全 chunk receding window)
|
| 44 |
-
- LeRobot async server 端 `--policy_checkpoint_path=wsagi/SmolVLA-PickOrange`
|
| 45 |
-
- `step_hz=30` 匹配 dataset
|
| 46 |
-
|
| 47 |
-
## 模型亮点
|
| 48 |
-
_Highlights_
|
| 49 |
-
|
| 50 |
-
- SmolVLA 全参微调在 60 ep 小数据上**部分能学到**,1/3 round 自然 success(3/3 oranges in 158s)— 比第三方 [`edge-inference/smolvla-so101-pick-orange`](https://huggingface.co/edge-inference/smolvla-so101-pick-orange) 的 0/3 强。
|
| 51 |
-
- 但 round 间方差大(episode 2 = 0/3,episode 3 = 2/3)— **60 ep × 30k step 仍欠拟合**。
|
| 52 |
-
- 大参数 VLM-based policy 在低数据 regime 下不如专精 visuomotor (ACT 80M) — 与原 SmolVLA 论文低数据 finding 一致。
|
| 53 |
|
| 54 |
-
##
|
| 55 |
-
_Training recipe_
|
| 56 |
|
| 57 |
-
|
|
| 58 |
-
|---|---|
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
| `
|
| 63 |
-
| Batch size | 8 (full-param, no LoRA) |
|
| 64 |
-
| Optimizer | AdamW, lr=1e-4 |
|
| 65 |
-
| Steps | 30000 (~14h on 4090) |
|
| 66 |
-
| `video_backend` | `pyav`(torchcodec 长跑 segfault) |
|
| 67 |
-
| Image augmentation | 无 |
|
| 68 |
-
| Train expert only | False(全参数) |
|
| 69 |
|
| 70 |
-
|
| 71 |
|
| 72 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
##
|
| 75 |
|
| 76 |
```bash
|
| 77 |
# 1. 启 LeRobot async policy server
|
| 78 |
bash server/start_server.sh --lerobot-only
|
| 79 |
|
| 80 |
# 2. 跑 LeIsaac PickOrange eval
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
--task=LeIsaac-SO101-PickOrange-v0 \
|
| 83 |
-
--eval_rounds=3 --episode_length_s=120 --step_hz=30 \
|
| 84 |
--policy_type=lerobot-smolvla \
|
| 85 |
-
--
|
| 86 |
-
--
|
| 87 |
-
--
|
| 88 |
-
--
|
|
|
|
|
|
|
| 89 |
--device=cuda --enable_cameras
|
| 90 |
```
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
| 93 |
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
policy = SmolVLAPolicy.from_pretrained("wsagi/SmolVLA-PickOrange")
|
| 97 |
-
# 见 LeRobot 文档
|
| 98 |
-
```
|
| 99 |
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|---|---|---|---|---|
|
| 105 |
-
| 1 | 3/3 ✅ | 158.2 s | env-success | 自然完成 |
|
| 106 |
-
| 2 | 0/3 | 551.7 s | key-R skip | 抓不中颤抖 |
|
| 107 |
-
| 3 | 2/3 | 355.0 s | manual-hang | lerobot server 中断;2 是 viewport 观察 |
|
| 108 |
|
| 109 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
## License
|
| 112 |
|
| 113 |
-
Apache-2.0(继承自 `lerobot/smolvla_base`
|
|
|
|
| 18 |
|
| 19 |
# SmolVLA-PickOrange
|
| 20 |
|
| 21 |
+
针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 **15k step**(main,sweep best)。
|
| 22 |
+
_A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. `main` = **step-15000 (sweep best)**, full-parameter from `lerobot/smolvla_base`._
|
| 23 |
|
| 24 |

|
| 25 |
|
|
|
|
| 35 |
- **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
|
| 36 |
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范,30 fps,dual-cam 480×640。
|
| 37 |
- **架构 / Architecture**:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,`chunk_size=50`。
|
| 38 |
+
- **训练 / Training**:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 总 30k step 训练,30k 后明显过拟合。**main = step-15000 (sweep best)**。
|
| 39 |
+
- **评测 / Eval**(Isaac Sim 5.1,**5 round × 3 颗 = 15 颗**,post-fix placement check):
|
| 40 |
+
- **2/5 strict rounds, 8/15 oranges (53%), 133s avg** ← 15k @ h=50
|
| 41 |
+
- 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) README leaderboard
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
## Checkpoint branches / ckpt 分支
|
|
|
|
| 44 |
|
| 45 |
+
| Branch | Step | env rounds | oranges | avg s | 备注 |
|
| 46 |
+
|---|---|---|---|---|---|
|
| 47 |
+
| **`main`** | **15000** | **2/5** | **8/15 (53%)** | **133s** | sweep best ⭐ |
|
| 48 |
+
| `ckpt-20k` | 20000 | 0/5 | 6/15 (40%) | 180s | (will be uploaded if needed) |
|
| 49 |
+
| `ckpt-25k` | 25000 | 1/5 | 5/15 (33%) | 160s | (will be uploaded if needed) |
|
| 50 |
+
| `ckpt-30k` | 30000 | 0/5 | 4/15 (27%) | 180s | overfit; 旧 main 已搬到此分支 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
_Sweep 用 h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1,单一 RTX 4090。_
|
| 53 |
|
| 54 |
+
## Ckpt sweep 曲线 / Ckpt sweep curve
|
| 55 |
+
|
| 56 |
+
15k 是最佳点:训得久了开始 overfit 60 ep 这个小数据集,过早(10k 以下)尚未学到完整 pick-place-pick-place 长程序列。
|
| 57 |
+
|
| 58 |
+
```
|
| 59 |
+
oranges/15
|
| 60 |
+
9 |
|
| 61 |
+
8 | ⭐ 15k
|
| 62 |
+
7 |
|
| 63 |
+
6 | ● 20k
|
| 64 |
+
5 | ● 25k
|
| 65 |
+
4 | ● 30k
|
| 66 |
+
3 |
|
| 67 |
+
2 |
|
| 68 |
+
1 |● 10k
|
| 69 |
+
0 +----------------------
|
| 70 |
+
10 15 20 25 30 k step
|
| 71 |
+
```
|
| 72 |
|
| 73 |
+
## 推理 inference 配置
|
| 74 |
|
| 75 |
```bash
|
| 76 |
# 1. 启 LeRobot async policy server
|
| 77 |
bash server/start_server.sh --lerobot-only
|
| 78 |
|
| 79 |
# 2. 跑 LeIsaac PickOrange eval
|
| 80 |
+
POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \
|
| 81 |
+
ACTION_HORIZON=50 \
|
| 82 |
+
EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
|
| 83 |
+
PROMPT="Pick up the orange and put it in the plate" \
|
| 84 |
+
conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \
|
| 85 |
--task=LeIsaac-SO101-PickOrange-v0 \
|
|
|
|
| 86 |
--policy_type=lerobot-smolvla \
|
| 87 |
+
--policy_port=8080 \
|
| 88 |
+
--policy_checkpoint_path=$POLICY_CHECKPOINT \
|
| 89 |
+
--policy_action_horizon=$ACTION_HORIZON \
|
| 90 |
+
--eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \
|
| 91 |
+
--max_round_wall_s=$MAX_ROUND_WALL_S \
|
| 92 |
+
--policy_language_instruction="$PROMPT" \
|
| 93 |
--device=cuda --enable_cameras
|
| 94 |
```
|
| 95 |
|
| 96 |
+
**关键 inference 参数 (per [scripts/benchmark/baselines_action_horizon.tsv](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/baselines_action_horizon.tsv))**:
|
| 97 |
+
- `action_horizon=50`(= train chunk_size,h=40 实测略弱)
|
| 98 |
+
- 选 branch `main` 拿 best;或 `ckpt-30k` / 任何 `ckpt-Nk` 拿对应阶段。
|
| 99 |
|
| 100 |
+
## 训练配方
|
| 101 |
+
_Training recipe_
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
+
| 项 / Item | 值 / Value |
|
| 104 |
+
|---|---|
|
| 105 |
+
| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
|
| 106 |
+
| Policy | `smolvla` (LeRobot 实现) |
|
| 107 |
+
| Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert |
|
| 108 |
+
| `chunk_size` / `n_action_steps` | 50 / 50 |
|
| 109 |
+
| Batch size | 8 (full-param, no LoRA) |
|
| 110 |
+
| Optimizer | AdamW, lr=1e-4 |
|
| 111 |
+
| Steps | 30000 (~14h on 4090) → main = **15000** (sweep best) |
|
| 112 |
+
| `video_backend` | `pyav`(torchcodec 长跑 segfault) |
|
| 113 |
+
| Image augmentation | 无 |
|
| 114 |
+
| Train expert only | False(全参数) |
|
| 115 |
|
| 116 |
+
> **🚨 schema-free base 关键 fix**:训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`(默认 `camera1/2/3 @ 256×256` 会污染微调路径),否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
|
| 118 |
+
## Eval 历史 / Eval history
|
| 119 |
+
|
| 120 |
+
| 版本 | env rounds | oranges | avg s | 备注 |
|
| 121 |
+
|---|---|---|---|---|
|
| 122 |
+
| 30k h=50 (旧 leaderboard) | 1/3 | 5/9 (55%) | 355s | sticky-OR + 3-round(旧 buggy 计数) |
|
| 123 |
+
| **30k h=50 (post-fix 5-round)** | 0/5 | 4/15 (27%) | 180s | 真实 5-round + pre-step snapshot |
|
| 124 |
+
| **15k h=50 (post-fix 5-round)** | **2/5** | **8/15 (53%)** | **133s** | **sweep best, 现 main** ⭐ |
|
| 125 |
|
| 126 |
## License
|
| 127 |
|
| 128 |
+
Apache-2.0(继承自 `lerobot/smolvla_base`)。
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 906712520
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1c42010653754d28cbafbcff2bbecee49aa80401935bcd1a2734dcb9d776901
|
| 3 |
size 906712520
|