LeRobot SO101 MultiTaskDiT task2-all_bs128_s30000
Summary
This repository contains the final checkpoint for a MultiTask DiT policy trained on aswinkumar99/task2-all for SO101 sponge pick-and-place experiments.
Dataset meaning: Task 2: Multiple Sponges - No Distractors (all layouts).
This model was trained with the LeRobot multi_task_dit policy and diffusion objective. It is not a fine-tune from a published base checkpoint.
Training Setup
- Dataset repo:
aswinkumar99/task2-all - Local dataset root during training:
/home/riftuser/datasets_combined/aswinkumar99/task2-all - Output directory during training:
/home/riftuser/outputs_matrix/multi_task_dit/task2-all_bs128_s30000 - Batch size:
128 - Training steps:
30000 - Checkpoint save frequency:
5000 - Data loader workers:
8 - WandB project:
so101-layout-generalization - GPU:
NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition - Python:
CPython 3.12.13 - CUDA:
12.9 - Training start:
2026-04-24T09:48:49.302378+00:00 - Training end:
2026-04-24T18:02:09 - Approximate training duration:
8h 13m 19s - Objective:
diffusion - Noise scheduler:
DDPM - Horizon:
32 - Action steps predicted:
24 - Observation steps:
2 - Vision encoder:
openai/clip-vit-base-patch16 - Text encoder:
openai/clip-vit-base-patch16 - Hidden dim:
512 - Number of transformer layers:
4
Exact Training Command
lerobot-train \
--dataset.repo_id=aswinkumar99/task2-all \
--dataset.root=/home/riftuser/datasets_combined/aswinkumar99/task2-all \
--dataset.video_backend=torchcodec \
--output_dir=/home/riftuser/outputs_matrix/multi_task_dit/task2-all_bs128_s30000 \
--job_name=multi_task_dit_task2-all_bs128 \
--batch_size=128 \
--steps=30000 \
--log_freq=200 \
--save_freq=5000 \
--save_checkpoint=true \
--num_workers=8 \
--wandb.enable=true \
--wandb.project=so101-layout-generalization \
--wandb.mode=online \
--wandb.disable_artifact=true \
--policy.type=multi_task_dit \
--policy.device=cuda \
--policy.push_to_hub=false \
--policy.use_amp=true \
--policy.horizon=32 \
--policy.n_action_steps=24 \
--policy.n_obs_steps=2 \
--policy.num_layers=4 \
--policy.hidden_dim=512 \
--policy.num_heads=8 \
--policy.dropout=0.1 \
--policy.timestep_embed_dim=256 \
--policy.use_rope=true \
--policy.use_positional_encoding=false \
--policy.objective=diffusion \
--policy.noise_scheduler_type=DDPM \
--policy.num_train_timesteps=100 \
--policy.optimizer_lr=2e-5 \
--policy.vision_encoder_lr_multiplier=0.1 \
--policy.vision_encoder_name=openai/clip-vit-base-patch16 \
--policy.text_encoder_name=openai/clip-vit-base-patch16 \
--policy.image_crop_shape=[224,224] \
--policy.image_crop_is_random=true
Repository Contents
pretrained_model/: final downloadable model artifacts for inference/loadingtraining_state/: optimizer, RNG, scheduler/state, and step information for resuming or auditability
Creator
Aswinkumar
- Website: aswinkumar.me
- Hugging Face repo: https://huggingface.co/aswinkumar99/LeRobot-SO101-MultiTaskDiT-task2-all_bs128_s30000