SmolVLA IsaacLab SO101 11-task BaseCaP 3300epi (8ep)

This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact.

Model Details

Related checkpoints from the same run:

Dataset

Key Value
Robot SO101 follower in IsaacLab
Episodes 3,300
Frames 3,522,774
Tasks 800
FPS 30
Camera streams observation.images.left_wrist, observation.images.top
Dataset state/action shape [6] / [6]

Reproduction

The uploaded train_config.json is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection.

Key Value
script lerobot/scripts/train_smolvla.sh
job_name smolvla_20260508_093756
output_dir /home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_20260508_093756
seed 1000
launch DDP via python -m accelerate.commands.launch --multi_gpu --num_processes=2 --mixed_precision=bf16 -m lerobot.scripts.lerobot_train
checkpoint_step 110000
checkpoint_epoch 7.99
checkpoint_train_loss 0.004
checkpoint_grad_norm 0.051
checkpoint_lr 2.5e-06
effective_batch 128 x 2 = 256

Approximate script invocation:

cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot
CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi" BATCH_SIZE="128" GRADIENT_ACCUMULATION_STEPS="1 (not set in script/config)" STEPS="110000" NUM_WORKERS="6" CUDA_VISIBLE_DEVICES="0, 1" NUM_GPUS="2" MIXED_PRECISION="bf16" SAVE_FREQ="28000" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla" bash train_smolvla.sh

Detailed Hyperparameters

Script Defaults and Environment

Key Value
CONDA_ENV lerobot
POLICY_TYPE smolvla
POLICY_PATH lerobot/smolvla_base
DATASET_REPO_ID CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
BATCH_SIZE 128
GRADIENT_ACCUMULATION_STEPS 1 (not set in script/config)
STEPS 110000
NUM_WORKERS 6
CUDA_VISIBLE_DEVICES 0, 1
NUM_GPUS 2
MIXED_PRECISION bf16
SAVE_FREQ 28000
LOG_FREQ 10
EVAL_FREQ 0
WANDB_PROJECT lerobot-smolvla

Training Loop and Dataloader

Key Value
steps 110000
batch_size 128
gradient_accumulation_steps 1
num_workers 6
dataloader_prefetch_factor null
dataloader_persistent_workers null
dataloader_pin_memory null
save_freq 28000
log_freq 10
eval_freq 0
cudnn_deterministic False
use_policy_training_preset True
ddp_find_unused_parameters null
profile_timing null

Dataset Pipeline

Key Value
dataset.repo_id CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
dataset.root null
dataset.episodes null
dataset.revision null
dataset.use_imagenet_stats True
dataset.video_backend torchcodec
dataset.streaming False

Image augmentation settings:

{
  "enable": true,
  "max_num_transforms": 3,
  "random_order": true,
  "tfs": {
    "brightness": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "brightness": [
          0.8,
          1.2
        ]
      }
    },
    "contrast": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "contrast": [
          0.8,
          1.2
        ]
      }
    },
    "saturation": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "saturation": [
          0.5,
          1.5
        ]
      }
    },
    "hue": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "hue": [
          -0.05,
          0.05
        ]
      }
    },
    "sharpness": {
      "weight": 1.0,
      "type": "SharpnessJitter",
      "kwargs": {
        "sharpness": [
          0.5,
          1.5
        ]
      }
    },
    "affine": {
      "weight": 1.0,
      "type": "RandomAffine",
      "kwargs": {
        "degrees": [
          -5.0,
          5.0
        ],
        "translate": [
          0.05,
          0.05
        ]
      }
    }
  }
}

Camera rename map:

{
  "observation.images.left_wrist": "observation.images.camera1",
  "observation.images.top": "observation.images.camera2"
}

Policy Configuration

{
  "type": "smolvla",
  "pretrained_path": "lerobot/smolvla_base",
  "vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
  "load_vlm_weights": true,
  "num_vlm_layers": 16,
  "freeze_vision_encoder": true,
  "train_expert_only": true,
  "train_state_proj": true,
  "use_peft": false,
  "use_amp": false,
  "chunk_size": 50,
  "n_action_steps": 50,
  "num_steps": 10,
  "max_state_dim": 32,
  "max_action_dim": 32,
  "resize_imgs_with_padding": [
    512,
    512
  ],
  "tokenizer_max_length": 48,
  "attention_mode": "cross_attn",
  "pad_language_to": "max_length",
  "use_cache": true,
  "num_expert_layers": 0,
  "expert_width_multiplier": 0.75,
  "self_attn_every_n_layers": 2,
  "min_period": 0.004,
  "max_period": 4.0,
  "compile_model": false,
  "compile_mode": "max-autotune",
  "normalization_mapping": {
    "VISUAL": "IDENTITY",
    "STATE": "MEAN_STD",
    "ACTION": "MEAN_STD"
  },
  "input_features": {
    "observation.state": {
      "type": "STATE",
      "shape": [
        6
      ]
    },
    "observation.images.camera1": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera2": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera3": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    }
  },
  "output_features": {
    "action": {
      "type": "ACTION",
      "shape": [
        6
      ]
    },
    "action.radian_urdf0": {
      "type": "ACTION",
      "shape": [
        6
      ]
    }
  }
}

Optimizer

{
  "type": "adamw",
  "lr": 0.0001,
  "weight_decay": 1e-10,
  "grad_clip_norm": 10.0,
  "betas": [
    0.9,
    0.95
  ],
  "eps": 1e-08
}

Scheduler

{
  "type": "cosine_decay_with_warmup",
  "num_warmup_steps": 1000,
  "num_decay_steps": 30000,
  "peak_lr": 0.0001,
  "decay_lr": 2.5e-06
}

Logging

{
  "enable": true,
  "disable_artifact": false,
  "project": "lerobot-smolvla",
  "entity": null,
  "notes": null,
  "run_id": "b3yvlype",
  "mode": null
}

Usage

Use this model as a LeRobot policy checkpoint:

python -m lerobot.scripts.lerobot_eval \
  --policy.path=CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep

For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.

Evaluation and Limitations

This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository.

The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG training_state files are not included; only the loadable pretrained_model artifact is uploaded.

Provenance

Downloads last month
65
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep

Finetuned
(5931)
this model
Finetunes
5 models

Dataset used to train CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep