--- language: - en license: mit tags: - MultiTaskDiT - LeRobot - robotics - imitation-learning - diffusion - so101 pipeline_tag: reinforcement-learning library_name: lerobot --- # LeRobot SO101 MultiTaskDiT task2-all_bs128_s30000 ## Summary This repository contains the final checkpoint for a MultiTask DiT policy trained on `aswinkumar99/task2-all` for SO101 sponge pick-and-place experiments. Dataset meaning: Task 2: Multiple Sponges - No Distractors (all layouts). This model was trained with the LeRobot `multi_task_dit` policy and diffusion objective. It is not a fine-tune from a published base checkpoint. ## Training Setup - Dataset repo: `aswinkumar99/task2-all` - Local dataset root during training: `/home/riftuser/datasets_combined/aswinkumar99/task2-all` - Output directory during training: `/home/riftuser/outputs_matrix/multi_task_dit/task2-all_bs128_s30000` - Batch size: `128` - Training steps: `30000` - Checkpoint save frequency: `5000` - Data loader workers: `8` - WandB project: `so101-layout-generalization` - GPU: `NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition` - Python: `CPython 3.12.13` - CUDA: `12.9` - Training start: `2026-04-24T09:48:49.302378+00:00` - Training end: `2026-04-24T18:02:09` - Approximate training duration: `8h 13m 19s` - Objective: `diffusion` - Noise scheduler: `DDPM` - Horizon: `32` - Action steps predicted: `24` - Observation steps: `2` - Vision encoder: `openai/clip-vit-base-patch16` - Text encoder: `openai/clip-vit-base-patch16` - Hidden dim: `512` - Number of transformer layers: `4` ## Exact Training Command ```bash lerobot-train \ --dataset.repo_id=aswinkumar99/task2-all \ --dataset.root=/home/riftuser/datasets_combined/aswinkumar99/task2-all \ --dataset.video_backend=torchcodec \ --output_dir=/home/riftuser/outputs_matrix/multi_task_dit/task2-all_bs128_s30000 \ --job_name=multi_task_dit_task2-all_bs128 \ --batch_size=128 \ --steps=30000 \ --log_freq=200 \ --save_freq=5000 \ --save_checkpoint=true \ --num_workers=8 \ --wandb.enable=true \ --wandb.project=so101-layout-generalization \ --wandb.mode=online \ --wandb.disable_artifact=true \ --policy.type=multi_task_dit \ --policy.device=cuda \ --policy.push_to_hub=false \ --policy.use_amp=true \ --policy.horizon=32 \ --policy.n_action_steps=24 \ --policy.n_obs_steps=2 \ --policy.num_layers=4 \ --policy.hidden_dim=512 \ --policy.num_heads=8 \ --policy.dropout=0.1 \ --policy.timestep_embed_dim=256 \ --policy.use_rope=true \ --policy.use_positional_encoding=false \ --policy.objective=diffusion \ --policy.noise_scheduler_type=DDPM \ --policy.num_train_timesteps=100 \ --policy.optimizer_lr=2e-5 \ --policy.vision_encoder_lr_multiplier=0.1 \ --policy.vision_encoder_name=openai/clip-vit-base-patch16 \ --policy.text_encoder_name=openai/clip-vit-base-patch16 \ --policy.image_crop_shape=[224,224] \ --policy.image_crop_is_random=true ``` ## Repository Contents - `pretrained_model/`: final downloadable model artifacts for inference/loading - `training_state/`: optimizer, RNG, scheduler/state, and step information for resuming or auditability ## Creator Aswinkumar - Website: [aswinkumar.me](https://aswinkumar.me) - Hugging Face repo: