Instructions to use wsagi/SmolVLA-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/SmolVLA-PickOrange with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=wsagi/SmolVLA-PickOrange \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=wsagi/SmolVLA-PickOrange - Notebooks
- Google Colab
- Kaggle
# Launch finetuning on your dataset
python lerobot/scripts/train.py \
--policy.path=wsagi/SmolVLA-PickOrange \
--dataset.repo_id=lerobot/svla_so101_pickplace \
--batch_size=64 \
--steps=20000 \
--output_dir=outputs/train/my_smolvla \
--job_name=my_smolvla_training \
--policy.device=cuda \
--wandb.enable=true# Run the policy using the record function
python -m lerobot.record \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM0 \ # <- Use your port
--robot.id=my_blue_follower_arm \ # <- Use your robot id
--robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras
--dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording
--dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub
--dataset.episode_time_s=50 \
--dataset.num_episodes=10 \
--policy.path=wsagi/SmolVLA-PickOrangeSmolVLA-PickOrange
针对 LeIsaac SO-101 PickOrange 任务 LoRA-free 微调的 SmolVLA 策略 — 自训 15k step(main,sweep best)。
A fine-tuned SmolVLA policy on the LeIsaac SO-101 PickOrange task. main = step-15000 (sweep best), full-parameter from lerobot/smolvla_base.
🔗 项目仓库 / Project repos:
- vitorcen/isaaclab-experience — Isaac Lab + LeIsaac 多策略横评(parent project)— 含 7-baseline benchmark
- vitorcen/LeIsaac-Training — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
关于命名 / About the name:
config.type=smolvla(LeRobot v1 SmolVLA implementation),backbone 用HuggingFaceTB/SmolVLM2-500M-Video-Instruct(SmolVLM2)。LeRobot 自己也叫smolvla而不是smolvla2,所以仓库名沿用SmolVLA-PickOrange。config.type=smolvla(LeRobot v1 SmolVLA implementation) withHuggingFaceTB/SmolVLM2-500M-Video-Instructbackbone. LeRobot keeps the policy namesmolvla(matching their naming), so this repo follows suit.
TL;DR
- 任务 / Task:
Pick up the orange and place it on the plate— SO-101 单臂依次夹起 3 颗橙子并放盘子。 - 数据集 / Dataset:
LightwheelAI/leisaac-pick-orange— 60 episode 遥操示范,30 fps,dual-cam 480×640。 - 架构 / Architecture:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,
chunk_size=50。 - 训练 / Training:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 总 30k step 训练,30k 后明显过拟合。**main = step-15000 (sweep best)**。
- 评测 / Eval(Isaac Sim 5.1,5 round × 3 颗 = 15 颗,post-fix placement check):
- 2/5 strict rounds, 8/15 oranges (53%), 133s avg ← 15k @ h=50
- 详见
vitorcen/isaaclab-experienceREADME leaderboard
Checkpoint branches / ckpt 分支
| Branch | Step | env rounds | oranges | avg s | 备注 |
|---|---|---|---|---|---|
main |
15000 | 2/5 | 8/15 (53%) | 133s | sweep best ⭐ |
ckpt-20k |
20000 | 0/5 | 6/15 (40%) | 180s | (will be uploaded if needed) |
ckpt-25k |
25000 | 1/5 | 5/15 (33%) | 160s | (will be uploaded if needed) |
ckpt-30k |
30000 | 0/5 | 4/15 (27%) | 180s | overfit; 旧 main 已搬到此分支 |
Sweep 用 h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1,单一 RTX 4090。
Ckpt sweep 曲线 / Ckpt sweep curve
15k 是最佳点:训得久了开始 overfit 60 ep 这个小数据集,过早(10k 以下)尚未学到完整 pick-place-pick-place 长程序列。
oranges/15
9 |
8 | ⭐ 15k
7 |
6 | ● 20k
5 | ● 25k
4 | ● 30k
3 |
2 |
1 |● 10k
0 +----------------------
10 15 20 25 30 k step
推理 inference 配置
# 1. 启 LeRobot async policy server
bash server/start_server.sh --lerobot-only
# 2. 跑 LeIsaac PickOrange eval
POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \
ACTION_HORIZON=50 \
EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
PROMPT="Pick up the orange and put it in the plate" \
conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \
--task=LeIsaac-SO101-PickOrange-v0 \
--policy_type=lerobot-smolvla \
--policy_port=8080 \
--policy_checkpoint_path=$POLICY_CHECKPOINT \
--policy_action_horizon=$ACTION_HORIZON \
--eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \
--max_round_wall_s=$MAX_ROUND_WALL_S \
--policy_language_instruction="$PROMPT" \
--device=cuda --enable_cameras
**关键 inference 参数 (per scripts/benchmark/baselines_action_horizon.tsv)**:
action_horizon=50(= train chunk_size,h=40 实测略弱)- 选 branch
main拿 best;或ckpt-30k/ 任何ckpt-Nk拿对应阶段。
训练配方
Training recipe
| 项 / Item | 值 / Value |
|---|---|
| Dataset | LightwheelAI/leisaac-pick-orange (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
| Policy | smolvla (LeRobot 实现) |
| Backbone | HuggingFaceTB/SmolVLM2-500M-Video-Instruct + Action Expert |
chunk_size / n_action_steps |
50 / 50 |
| Batch size | 8 (full-param, no LoRA) |
| Optimizer | AdamW, lr=1e-4 |
| Steps | 30000 (~14h on 4090) → main = 15000 (sweep best) |
video_backend |
pyav(torchcodec 长跑 segfault) |
| Image augmentation | 无 |
| Train expert only | False(全参数) |
🚨 schema-free base 关键 fix:训练前必须用
prepare_base.sh剥光lerobot/smolvla_base自带的input_features/empty_cameras(默认camera1/2/3 @ 256×256会污染微调路径),否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。
Eval 历史 / Eval history
| 版本 | env rounds | oranges | avg s | 备注 |
|---|---|---|---|---|
| 30k h=50 (旧 leaderboard) | 1/3 | 5/9 (55%) | 355s | sticky-OR + 3-round(旧 buggy 计数) |
| 30k h=50 (post-fix 5-round) | 0/5 | 4/15 (27%) | 180s | 真实 5-round + pre-step snapshot |
| 15k h=50 (post-fix 5-round) | 2/5 | 8/15 (53%) | 133s | sweep best, 现 main ⭐ |
License
Apache-2.0(继承自 lerobot/smolvla_base)。
- Downloads last month
- 65
Model tree for wsagi/SmolVLA-PickOrange
Base model
lerobot/smolvla_base
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]