smolVLA ยท IsaacLab SO101 Pick & Place (single-task, 50 epoch)

lerobot/smolvla_base ๋ฅผ IsaacLab ์‹œ๋ฎฌ๋ ˆ์ด์…˜ SO101 pick & place ๋‹จ์ผ task ๋ฐ์ดํ„ฐ์…‹ CoRL2026-CSI/IsaacLab-SO101_pick_place_baseCaP_100epi_10fps ์œผ๋กœ 50 epoch ํŒŒ์ธํŠœ๋‹ํ•œ SmolVLA ์ •์ฑ….

์ด ์ฒดํฌํฌ์ธํŠธ๋Š” full model (model.safetensors) ์ž…๋‹ˆ๋‹ค โ€” LoRA adapter ๊ฐ€ ์•„๋‹ˆ๋ฉฐ, ๊ทธ๋Œ€๋กœ ๋กœ๋“œํ•ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Model details

  • Base model: lerobot/smolvla_base (SmolVLM2-500M-Video-Instruct VLM + action expert)
  • Robot: SO101 (6-DOF, gripper ํฌํ•จ) โ€” IsaacLab ์‹œ๋ฎฌ๋ ˆ์ด์…˜
  • Cameras: top, left_wrist (480ร—640) โ€” ์ •์ฑ… ํ‚ค camera1(left_wrist) / camera2(top) ๋กœ rename
  • Inputs: observation.state[6] + ์นด๋ฉ”๋ผ 2๊ฐœ + language instruction (task)
  • Output: action[6] (joint position)
  • Action chunking: chunk_size=50, n_action_steps=50

ํ•™์Šต ๋ฐฉ์‹

VLM frozen + action expert only โ€” SmolVLA ๊ณต์‹ ํ‘œ์ค€ ํ•™์Šต ๋ฐฉ์‹ (SmolVLA paper, arXiv:2506.01844).

๊ตฌ์„ฑ์š”์†Œ ์ƒํƒœ
VLM backbone (SmolVLM2) โ„๏ธ ์™„์ „ Frozen (freeze_vision_encoder=true)
Action expert ๐Ÿ”ฅ ํ•™์Šต (train_expert_only=true)
PEFT / LoRA ์‚ฌ์šฉ ์•ˆ ํ•จ

Training hyperparameters

ํ•ญ๋ชฉ ๊ฐ’
Dataset IsaacLab-SO101_pick_place_baseCaP_100epi_10fps โ€” 100 episodes / 34,264 frames / 10 fps
Epochs / Steps 50 epoch / 6,700 steps
Global batch size 256 (micro batch 128 ร— 2 GPU)
Optimizer AdamW โ€” lr 1e-4, weight_decay 1e-10, grad_clip_norm 10.0
LR scheduler cosine_decay_with_warmup โ€” warmup 1,000 / decay 30,000 / peak_lr 1e-4 / decay_lr 2.5e-6
chunk_size / n_action_steps 50 / 50
Seed 1000
Dataloader workers 16
Mixed precision no (bf16 inference)
Image augmentation ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter โ€” ๊ธฐํ•˜ํ•™์  ๋ณ€ํ˜•(ํšŒ์ „/์ด๋™/๋ฐ˜์ „) ์—†์Œ (VLA ์ขŒ์šฐ ์˜๋ฏธ ๋ณด์กด)
Hardware 2 ร— NVIDIA H100 80GB
Final loss 0.013

Camera rename

LeRobot dataset ์˜ ์นด๋ฉ”๋ผ ํ‚ค์™€ SmolVLA ์ •์ฑ… ํ‚ค ๋งคํ•‘:

Dataset key Policy key
observation.images.left_wrist observation.images.camera1
observation.images.top observation.images.camera2

Input / Output ๊ทœ์ •

  • Input: observation.state[6] (joint position) + ์นด๋ฉ”๋ผ 2๊ฐœ + language instruction(task) ๋งŒ
  • Output: action[6] (joint position) ๋งŒ
  • ๋ฐ์ดํ„ฐ์…‹์˜ ee_pos / gripper_binary / state.radian_urdf0 / action.radian_urdf0 ๋Š” ํ•™์Šต์—์„œ ์ œ์™ธ
  • SmolVLA ์ •์ฑ…์€ ์นด๋ฉ”๋ผ ์Šฌ๋กฏ์ด 3๊ฐœ(camera1/2/3)๋กœ ๊ณ ์ •์ด๋ผ camera3 ์Šฌ๋กฏ์ด config ์— ์กด์žฌํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ์…‹ ์นด๋ฉ”๋ผ๋Š” 2๊ฐœ๋ฟ์ด๋ผ ์‹ค์ œ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ํ๋ฅด๋Š” ์นด๋ฉ”๋ผ๋Š” 2๊ฐœ์ž…๋‹ˆ๋‹ค.

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-IsaacLab-picknplace-50epoch")

Citation / Acknowledgement

Built on top of LeRobot and the SmolVLA base checkpoint. Project: CoRL 2026 CSI submission.

Framework versions

  • LeRobot 0.5.2
Downloads last month
17
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
BF16
ยท
Video Preview
loading

Model tree for CoRL2026-CSI/smolVLA-IsaacLab-picknplace-50epoch

Finetuned
(5931)
this model

Dataset used to train CoRL2026-CSI/smolVLA-IsaacLab-picknplace-50epoch

Paper for CoRL2026-CSI/smolVLA-IsaacLab-picknplace-50epoch