How to use from the
Use from the
LeRobot library
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e .[smolvla]
# Launch finetuning on your dataset
python lerobot/scripts/train.py \
--policy.path=wsagi/SmolVLA-PickOrange \
--dataset.repo_id=lerobot/svla_so101_pickplace \
--batch_size=64 \
--steps=20000 \
--output_dir=outputs/train/my_smolvla \
--job_name=my_smolvla_training \
--policy.device=cuda \
--wandb.enable=true
# Run the policy using the record function
python -m lerobot.record \
  --robot.type=so101_follower \
  --robot.port=/dev/ttyACM0 \ # <- Use your port
  --robot.id=my_blue_follower_arm \ # <- Use your robot id
  --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras
  --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording
  --dataset.repo_id=HF_USER/dataset_name \  # <- This will be the dataset name on HF Hub
  --dataset.episode_time_s=50 \
  --dataset.num_episodes=10 \
  --policy.path=wsagi/SmolVLA-PickOrange

SmolVLA-PickOrange

针对 LeIsaac SO-101 PickOrange 任务 LoRA-free 微调的 SmolVLA 策略 — 自训 15k step(main,sweep best)。 A fine-tuned SmolVLA policy on the LeIsaac SO-101 PickOrange task. main = step-15000 (sweep best), full-parameter from lerobot/smolvla_base.

SmolVLA-PickOrange — SO-101 in Isaac Sim

🔗 项目仓库 / Project repos

关于命名 / About the nameconfig.type=smolvla (LeRobot v1 SmolVLA implementation),backbone 用 HuggingFaceTB/SmolVLM2-500M-Video-Instruct (SmolVLM2)。LeRobot 自己也叫 smolvla 而不是 smolvla2,所以仓库名沿用 SmolVLA-PickOrangeconfig.type=smolvla (LeRobot v1 SmolVLA implementation) with HuggingFaceTB/SmolVLM2-500M-Video-Instruct backbone. LeRobot keeps the policy name smolvla (matching their naming), so this repo follows suit.

TL;DR

  • 任务 / TaskPick up the orange and place it on the plate — SO-101 单臂依次夹起 3 颗橙子并放盘子。
  • 数据集 / DatasetLightwheelAI/leisaac-pick-orange — 60 episode 遥操示范,30 fps,dual-cam 480×640。
  • 架构 / Architecture:SmolVLA v1(450M),SmolVLM2-500M-Video-Instruct backbone + Action Expert,chunk_size=50
  • 训练 / Training:full-param 微调(无 LoRA),batch=8 / lr=1e-4 / 总 30k step 训练,30k 后明显过拟合。**main = step-15000 (sweep best)**。
  • 评测 / Eval(Isaac Sim 5.1,5 round × 3 颗 = 15 颗,post-fix placement check):

Checkpoint branches / ckpt 分支

Branch Step env rounds oranges avg s 备注
main 15000 2/5 8/15 (53%) 133s sweep best ⭐
ckpt-20k 20000 0/5 6/15 (40%) 180s (will be uploaded if needed)
ckpt-25k 25000 1/5 5/15 (33%) 160s (will be uploaded if needed)
ckpt-30k 30000 0/5 4/15 (27%) 180s overfit; 旧 main 已搬到此分支

Sweep 用 h=50 (= train chunk_size), 5 round × 5 ckpt = 75 ep on Isaac Sim 5.1,单一 RTX 4090。

Ckpt sweep 曲线 / Ckpt sweep curve

15k 是最佳点:训得久了开始 overfit 60 ep 这个小数据集,过早(10k 以下)尚未学到完整 pick-place-pick-place 长程序列。

oranges/15
  9 |
  8 |          ⭐ 15k
  7 |
  6 |       ●  20k
  5 |          ●  25k
  4 |             ●  30k
  3 |
  2 |
  1 |●  10k
  0 +----------------------
    10  15  20  25  30  k step

推理 inference 配置

# 1. 启 LeRobot async policy server
bash server/start_server.sh --lerobot-only

# 2. 跑 LeIsaac PickOrange eval
POLICY_CHECKPOINT=wsagi/SmolVLA-PickOrange \
ACTION_HORIZON=50 \
EVAL_ROUNDS=5 EPISODE_LENGTH=120 MAX_ROUND_WALL_S=180 \
PROMPT="Pick up the orange and put it in the plate" \
conda run -n isaaclab python LeIsaac/scripts/evaluation/policy_inference.py \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --policy_type=lerobot-smolvla \
    --policy_port=8080 \
    --policy_checkpoint_path=$POLICY_CHECKPOINT \
    --policy_action_horizon=$ACTION_HORIZON \
    --eval_rounds=$EVAL_ROUNDS --episode_length_s=$EPISODE_LENGTH \
    --max_round_wall_s=$MAX_ROUND_WALL_S \
    --policy_language_instruction="$PROMPT" \
    --device=cuda --enable_cameras

**关键 inference 参数 (per scripts/benchmark/baselines_action_horizon.tsv)**:

  • action_horizon=50(= train chunk_size,h=40 实测略弱)
  • 选 branch main 拿 best;或 ckpt-30k / 任何 ckpt-Nk 拿对应阶段。

训练配方

Training recipe

项 / Item 值 / Value
Dataset LightwheelAI/leisaac-pick-orange (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)
Policy smolvla (LeRobot 实现)
Backbone HuggingFaceTB/SmolVLM2-500M-Video-Instruct + Action Expert
chunk_size / n_action_steps 50 / 50
Batch size 8 (full-param, no LoRA)
Optimizer AdamW, lr=1e-4
Steps 30000 (~14h on 4090) → main = 15000 (sweep best)
video_backend pyav(torchcodec 长跑 segfault)
Image augmentation
Train expert only False(全参数)

🚨 schema-free base 关键 fix:训练前必须用 prepare_base.sh 剥光 lerobot/smolvla_base 自带的 input_features / empty_cameras(默认 camera1/2/3 @ 256×256 会污染微调路径),否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。

Eval 历史 / Eval history

版本 env rounds oranges avg s 备注
30k h=50 (旧 leaderboard) 1/3 5/9 (55%) 355s sticky-OR + 3-round(旧 buggy 计数)
30k h=50 (post-fix 5-round) 0/5 4/15 (27%) 180s 真实 5-round + pre-step snapshot
15k h=50 (post-fix 5-round) 2/5 8/15 (53%) 133s sweep best, 现 main

License

Apache-2.0(继承自 lerobot/smolvla_base)。

Downloads last month
65
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for wsagi/SmolVLA-PickOrange

Finetuned
(5830)
this model

Dataset used to train wsagi/SmolVLA-PickOrange