SmolVLA IsaacLab SO101 11-task BaseCaP 3300epi (8ep)

This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact.

Model Details

Policy: SmolVLA
Base checkpoint: lerobot/smolvla_base
Training dataset: CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
Training script: lerobot/scripts/train_smolvla.sh
Checkpoint: step 110000, approximately 7.99 epochs
Reported training loss at checkpoint: 0.004
Resolved config: train_config.json

Related checkpoints from the same run:

Dataset

Key	Value
`Robot`	SO101 follower in IsaacLab
`Episodes`	3,300
`Frames`	3,522,774
`Tasks`	800
`FPS`	30
`Camera streams`	`observation.images.left_wrist`, `observation.images.top`
`Dataset state/action shape`	[6] / [6]

Reproduction

The uploaded train_config.json is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection.

Key	Value
`script`	lerobot/scripts/train_smolvla.sh
`job_name`	smolvla_20260508_093756
`output_dir`	/home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_20260508_093756
`seed`	1000
`launch`	DDP via `python -m accelerate.commands.launch --multi_gpu --num_processes=2 --mixed_precision=bf16 -m lerobot.scripts.lerobot_train`
`checkpoint_step`	110000
`checkpoint_epoch`	7.99
`checkpoint_train_loss`	0.004
`checkpoint_grad_norm`	0.051
`checkpoint_lr`	2.5e-06
`effective_batch`	128 x 2 = 256

Approximate script invocation:

cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot
CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi" BATCH_SIZE="128" GRADIENT_ACCUMULATION_STEPS="1 (not set in script/config)" STEPS="110000" NUM_WORKERS="6" CUDA_VISIBLE_DEVICES="0, 1" NUM_GPUS="2" MIXED_PRECISION="bf16" SAVE_FREQ="28000" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla" bash train_smolvla.sh

Detailed Hyperparameters

Script Defaults and Environment

Key	Value
`CONDA_ENV`	lerobot
`POLICY_TYPE`	smolvla
`POLICY_PATH`	lerobot/smolvla_base
`DATASET_REPO_ID`	CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
`BATCH_SIZE`	128
`GRADIENT_ACCUMULATION_STEPS`	1 (not set in script/config)
`STEPS`	110000
`NUM_WORKERS`	6
`CUDA_VISIBLE_DEVICES`	0, 1
`NUM_GPUS`	2
`MIXED_PRECISION`	bf16
`SAVE_FREQ`	28000
`LOG_FREQ`	10
`EVAL_FREQ`	0
`WANDB_PROJECT`	lerobot-smolvla

Training Loop and Dataloader

Key	Value
`steps`	110000
`batch_size`	128
`gradient_accumulation_steps`	1
`num_workers`	6
`dataloader_prefetch_factor`	`null`
`dataloader_persistent_workers`	`null`
`dataloader_pin_memory`	`null`
`save_freq`	28000
`log_freq`	10
`eval_freq`	0
`cudnn_deterministic`	False
`use_policy_training_preset`	True
`ddp_find_unused_parameters`	`null`
`profile_timing`	`null`

Dataset Pipeline

Key	Value
`dataset.repo_id`	CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi
`dataset.root`	`null`
`dataset.episodes`	`null`
`dataset.revision`	`null`
`dataset.use_imagenet_stats`	True
`dataset.video_backend`	torchcodec
`dataset.streaming`	False

Image augmentation settings:

{
  "enable": true,
  "max_num_transforms": 3,
  "random_order": true,
  "tfs": {
    "brightness": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "brightness": [
          0.8,
          1.2
        ]
      }
    },
    "contrast": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "contrast": [
          0.8,
          1.2
        ]
      }
    },
    "saturation": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "saturation": [
          0.5,
          1.5
        ]
      }
    },
    "hue": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "hue": [
          -0.05,
          0.05
        ]
      }
    },
    "sharpness": {
      "weight": 1.0,
      "type": "SharpnessJitter",
      "kwargs": {
        "sharpness": [
          0.5,
          1.5
        ]
      }
    },
    "affine": {
      "weight": 1.0,
      "type": "RandomAffine",
      "kwargs": {
        "degrees": [
          -5.0,
          5.0
        ],
        "translate": [
          0.05,
          0.05
        ]
      }
    }
  }
}

Camera rename map:

{
  "observation.images.left_wrist": "observation.images.camera1",
  "observation.images.top": "observation.images.camera2"
}

Policy Configuration

{
  "type": "smolvla",
  "pretrained_path": "lerobot/smolvla_base",
  "vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
  "load_vlm_weights": true,
  "num_vlm_layers": 16,
  "freeze_vision_encoder": true,
  "train_expert_only": true,
  "train_state_proj": true,
  "use_peft": false,
  "use_amp": false,
  "chunk_size": 50,
  "n_action_steps": 50,
  "num_steps": 10,
  "max_state_dim": 32,
  "max_action_dim": 32,
  "resize_imgs_with_padding": [
    512,
    512
  ],
  "tokenizer_max_length": 48,
  "attention_mode": "cross_attn",
  "pad_language_to": "max_length",
  "use_cache": true,
  "num_expert_layers": 0,
  "expert_width_multiplier": 0.75,
  "self_attn_every_n_layers": 2,
  "min_period": 0.004,
  "max_period": 4.0,
  "compile_model": false,
  "compile_mode": "max-autotune",
  "normalization_mapping": {
    "VISUAL": "IDENTITY",
    "STATE": "MEAN_STD",
    "ACTION": "MEAN_STD"
  },
  "input_features": {
    "observation.state": {
      "type": "STATE",
      "shape": [
        6
      ]
    },
    "observation.images.camera1": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera2": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera3": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    }
  },
  "output_features": {
    "action": {
      "type": "ACTION",
      "shape": [
        6
      ]
    },
    "action.radian_urdf0": {
      "type": "ACTION",
      "shape": [
        6
      ]
    }
  }
}

Optimizer

{
  "type": "adamw",
  "lr": 0.0001,
  "weight_decay": 1e-10,
  "grad_clip_norm": 10.0,
  "betas": [
    0.9,
    0.95
  ],
  "eps": 1e-08
}

Scheduler

{
  "type": "cosine_decay_with_warmup",
  "num_warmup_steps": 1000,
  "num_decay_steps": 30000,
  "peak_lr": 0.0001,
  "decay_lr": 2.5e-06
}

Logging

{
  "enable": true,
  "disable_artifact": false,
  "project": "lerobot-smolvla",
  "entity": null,
  "notes": null,
  "run_id": "b3yvlype",
  "mode": null
}

Usage

Use this model as a LeRobot policy checkpoint:

python -m lerobot.scripts.lerobot_eval \
  --policy.path=CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep

For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.

Evaluation and Limitations

This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository.

The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG training_state files are not included; only the loadable pretrained_model artifact is uploaded.

Provenance

VLM backbone: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
Fine-tuning run: smolvla_20260508_093756
Source training script: lerobot/scripts/train_smolvla.sh

Downloads last month: 65

Safetensors

Model size

0.5B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep

Base model

lerobot/smolvla_base

Finetuned

(5931)

this model

Finetunes

5 models

CoRL2026-CSI
/

smolvla_isaaclab_so101_11task_basecap_3300epi_8ep