| --- |
| language: |
| - en |
| license: mit |
| tags: |
| - MultiTaskDiT |
| - LeRobot |
| - robotics |
| - imitation-learning |
| - diffusion |
| - so101 |
| pipeline_tag: reinforcement-learning |
| library_name: lerobot |
| --- |
| |
| # LeRobot SO101 MultiTaskDiT task1-all_bs128_s30000 |
|
|
| ## Summary |
|
|
| This repository contains the final checkpoint for a MultiTask DiT policy trained on `aswinkumar99/task1-all` for SO101 sponge pick-and-place experiments. |
|
|
| Dataset meaning: Task 1: Single Sponge - No Distractors (all layouts). |
|
|
| This model was trained with the LeRobot `multi_task_dit` policy and diffusion objective. It is not a fine-tune from a published base checkpoint. |
|
|
| ## Training Setup |
|
|
| - Dataset repo: `aswinkumar99/task1-all` |
| - Local dataset root during training: `/home/riftuser/datasets_combined/aswinkumar99/task1-all` |
| - Output directory during training: `/home/riftuser/outputs_matrix/multi_task_dit/task1-all_bs128_s30000` |
| - Batch size: `128` |
| - Training steps: `30000` |
| - Checkpoint save frequency: `5000` |
| - Data loader workers: `8` |
| - WandB project: `so101-layout-generalization` |
| - GPU: `NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition` |
| - Python: `CPython 3.12.13` |
| - CUDA: `12.9` |
| - Training start: `2026-04-24T09:41:26.621865+00:00` |
| - Training end: `2026-04-24T17:59:31` |
| - Approximate training duration: `8h 18m 4s` |
| - Objective: `diffusion` |
| - Noise scheduler: `DDPM` |
| - Horizon: `32` |
| - Action steps predicted: `24` |
| - Observation steps: `2` |
| - Vision encoder: `openai/clip-vit-base-patch16` |
| - Text encoder: `openai/clip-vit-base-patch16` |
| - Hidden dim: `512` |
| - Number of transformer layers: `4` |
|
|
| ## Exact Training Command |
|
|
| ```bash |
| lerobot-train \ |
| --dataset.repo_id=aswinkumar99/task1-all \ |
| --dataset.root=/home/riftuser/datasets_combined/aswinkumar99/task1-all \ |
| --dataset.video_backend=torchcodec \ |
| --output_dir=/home/riftuser/outputs_matrix/multi_task_dit/task1-all_bs128_s30000 \ |
| --job_name=multi_task_dit_task1-all_bs128 \ |
| --batch_size=128 \ |
| --steps=30000 \ |
| --log_freq=200 \ |
| --save_freq=5000 \ |
| --save_checkpoint=true \ |
| --num_workers=8 \ |
| --wandb.enable=true \ |
| --wandb.project=so101-layout-generalization \ |
| --wandb.mode=online \ |
| --wandb.disable_artifact=true \ |
| --policy.type=multi_task_dit \ |
| --policy.device=cuda \ |
| --policy.push_to_hub=false \ |
| --policy.use_amp=true \ |
| --policy.horizon=32 \ |
| --policy.n_action_steps=24 \ |
| --policy.n_obs_steps=2 \ |
| --policy.num_layers=4 \ |
| --policy.hidden_dim=512 \ |
| --policy.num_heads=8 \ |
| --policy.dropout=0.1 \ |
| --policy.timestep_embed_dim=256 \ |
| --policy.use_rope=true \ |
| --policy.use_positional_encoding=false \ |
| --policy.objective=diffusion \ |
| --policy.noise_scheduler_type=DDPM \ |
| --policy.num_train_timesteps=100 \ |
| --policy.optimizer_lr=2e-5 \ |
| --policy.vision_encoder_lr_multiplier=0.1 \ |
| --policy.vision_encoder_name=openai/clip-vit-base-patch16 \ |
| --policy.text_encoder_name=openai/clip-vit-base-patch16 \ |
| --policy.image_crop_shape=[224,224] \ |
| --policy.image_crop_is_random=true |
| ``` |
|
|
| ## Repository Contents |
|
|
| - `pretrained_model/`: final downloadable model artifacts for inference/loading |
| - `training_state/`: optimizer, RNG, scheduler/state, and step information for resuming or auditability |
|
|
| ## Creator |
|
|
| Aswinkumar |
|
|
| - Website: [aswinkumar.me](https://aswinkumar.me) |
| - Hugging Face repo: <https://huggingface.co/aswinkumar99/LeRobot-SO101-MultiTaskDiT-task1-all_bs128_s30000> |
|
|