Instructions to use wsagi/SmolVLA-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/SmolVLA-PickOrange with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=wsagi/SmolVLA-PickOrange \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=wsagi/SmolVLA-PickOrange - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: lerobot | |
| pipeline_tag: robotics | |
| tags: | |
| - smolvla | |
| - lerobot | |
| - so101 | |
| - leisaac | |
| - pick-orange | |
| - isaac-sim | |
| datasets: | |
| - LightwheelAI/leisaac-pick-orange | |
| language: | |
| - en | |
| base_model: lerobot/smolvla_base | |
| # SmolVLA-PickOrange | |
| 针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务 LoRA-free 微调的 [SmolVLA](https://huggingface.co/lerobot/smolvla_base) 策略 — 自训 **15k step**(main,sweep best)。 | |
| _A fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) policy on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. `main` = **step-15000 (sweep best)**, full-parameter from `lerobot/smolvla_base`._ | |
|  | |
| **🔗 项目仓库 / Project repos**: | |
| - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)— 含 7-baseline benchmark | |
| - [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs) | |
| > **关于命名 / About the name**:`config.type=smolvla` (LeRobot v1 SmolVLA implementation),backbone 用 `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` (SmolVLM**2**)。LeRobot 自己也叫 `smolvla` 而不是 `smolvla2`,所以仓库名沿用 `SmolVLA-PickOrange`。 | |
| > _`config.type=smolvla` (LeRobot v1 SmolVLA implementation) with `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` backbone. LeRobot keeps the policy name `smolvla` (matching their naming), so this repo follows suit._ | |
| ## TL;DR | |
| - **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。 | |
| - **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范,30 fps,dual-cam 480×640。 | |
| - **架构 / Architecture**:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,`chunk_size=50`。 | |
| - **训练 / Training**:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 总 30k step 训练,30k 后明显过拟合。**main = step-15000 (sweep best)**。 | |
| - **评测 / Eval**(Isaac Sim 5.1,**5 round × 3 颗 = 15 颗**,post-fix placement check): | |
| - **2/5 strict rounds, 8/15 oranges (53%), 133s avg** ← 15k @ h=50 | |
| - 详见 [`vitorcen/isaaclab-experience`](https://github.com/vitorcen/isaaclab-experience) README leaderboard | |
| ## Checkpoint branches / ckpt 分支 | |
| | Branch | Step | env rounds | oranges | avg s | 备注 | | |
| |---|---|---|---|---|---| | |
| | **`main`** | **15000** | **2/5** | **8/15 (53%)** | **133s** | sweep best ⭐ | | |
| | `ckpt-20k` | 20000 | 0/5 | 6/15 (40%) | 180s | (will be uploaded if needed) | | |
| | `ckpt-25k` | 25000 | 1/5 | 5/15 (33%) | 160s | (will be uploaded if needed) | | |
| | `ckpt-30k` | 30000 | 0/5 | 4/15 (27%) | 180s | overfit; 旧 main 已搬到此分支 | | |
| _Sweep 用 h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1,单一 RTX 4090。_ | |
| ## Ckpt sweep 曲线 / Ckpt sweep curve | |
| 15k 是最佳点:训得久了开始 overfit 60 ep 这个小数据集,过早(10k 以下)尚未学到完整 pick-place-pick-place 长程序列。 | |
| ``` | |
| oranges/15 | |
| 9 | | |
| 8 | ⭐ 15k | |
| 7 | | |
| 6 | ● 20k | |
| 5 | ● 25k | |
| 4 | ● 30k | |
| 3 | | |
| 2 | | |
| 1 |● 10k | |
| 0 +---------------------- | |
| 10 15 20 25 30 k step | |
| ``` | |
| ## 推理 inference 配置 | |
| ```bash | |
| # 1. 启 LeRobot async policy server | |
| bash server/start_server.sh --lerobot-only | |
| # 2. 跑 LeIsaac PickOrange eval | |
| POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \ | |
| ACTION_HORIZON=50 \ | |
| EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \ | |
| PROMPT="Pick up the orange and put it in the plate" \ | |
| conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \ | |
| --task=LeIsaac-SO101-PickOrange-v0 \ | |
| --policy_type=lerobot-smolvla \ | |
| --policy_port=8080 \ | |
| --policy_checkpoint_path=$POLICY_CHECKPOINT \ | |
| --policy_action_horizon=$ACTION_HORIZON \ | |
| --eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \ | |
| --max_round_wall_s=$MAX_ROUND_WALL_S \ | |
| --policy_language_instruction="$PROMPT" \ | |
| --device=cuda --enable_cameras | |
| ``` | |
| **关键 inference 参数 (per [scripts/benchmark/baselines_action_horizon.tsv](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/baselines_action_horizon.tsv))**: | |
| - `action_horizon=50`(= train chunk_size,h=40 实测略弱) | |
| - 选 branch `main` 拿 best;或 `ckpt-30k` / 任何 `ckpt-Nk` 拿对应阶段。 | |
| ## 训练配方 | |
| _Training recipe_ | |
| | 项 / Item | 值 / Value | | |
| |---|---| | |
| | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) | | |
| | Policy | `smolvla` (LeRobot 实现) | | |
| | Backbone | `HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert | | |
| | `chunk_size` / `n_action_steps` | 50 / 50 | | |
| | Batch size | 8 (full-param, no LoRA) | | |
| | Optimizer | AdamW, lr=1e-4 | | |
| | Steps | 30000 (~14h on 4090) → main = **15000** (sweep best) | | |
| | `video_backend` | `pyav`(torchcodec 长跑 segfault) | | |
| | Image augmentation | 无 | | |
| | Train expert only | False(全参数) | | |
| > **🚨 schema-free base 关键 fix**:训练前必须用 [`prepare_base.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/smolvla/prepare_base.sh) 剥光 `lerobot/smolvla_base` 自带的 `input_features` / `empty_cameras`(默认 `camera1/2/3 @ 256×256` 会污染微调路径),否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。 | |
| ## Eval 历史 / Eval history | |
| | 版本 | env rounds | oranges | avg s | 备注 | | |
| |---|---|---|---|---| | |
| | 30k h=50 (旧 leaderboard) | 1/3 | 5/9 (55%) | 355s | sticky-OR + 3-round(旧 buggy 计数) | | |
| | **30k h=50 (post-fix 5-round)** | 0/5 | 4/15 (27%) | 180s | 真实 5-round + pre-step snapshot | | |
| | **15k h=50 (post-fix 5-round)** | **2/5** | **8/15 (53%)** | **133s** | **sweep best, 现 main** ⭐ | | |
| ## License | |
| Apache-2.0(继承自 `lerobot/smolvla_base`)。 | |