File size: 3,345 Bytes
335990d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
language:
- en
license: mit
tags:
- MultiTaskDiT
- LeRobot
- robotics
- imitation-learning
- diffusion
- so101
pipeline_tag: reinforcement-learning
library_name: lerobot
---

# LeRobot SO101 MultiTaskDiT task2-all_bs128_s30000

## Summary

This repository contains the final checkpoint for a MultiTask DiT policy trained on `aswinkumar99/task2-all` for SO101 sponge pick-and-place experiments.

Dataset meaning: Task 2: Multiple Sponges - No Distractors (all layouts).

This model was trained with the LeRobot `multi_task_dit` policy and diffusion objective. It is not a fine-tune from a published base checkpoint.

## Training Setup

- Dataset repo: `aswinkumar99/task2-all`
- Local dataset root during training: `/home/riftuser/datasets_combined/aswinkumar99/task2-all`
- Output directory during training: `/home/riftuser/outputs_matrix/multi_task_dit/task2-all_bs128_s30000`
- Batch size: `128`
- Training steps: `30000`
- Checkpoint save frequency: `5000`
- Data loader workers: `8`
- WandB project: `so101-layout-generalization`
- GPU: `NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition`
- Python: `CPython 3.12.13`
- CUDA: `12.9`
- Training start: `2026-04-24T09:48:49.302378+00:00`
- Training end: `2026-04-24T18:02:09`
- Approximate training duration: `8h 13m 19s`
- Objective: `diffusion`
- Noise scheduler: `DDPM`
- Horizon: `32`
- Action steps predicted: `24`
- Observation steps: `2`
- Vision encoder: `openai/clip-vit-base-patch16`
- Text encoder: `openai/clip-vit-base-patch16`
- Hidden dim: `512`
- Number of transformer layers: `4`

## Exact Training Command

```bash
lerobot-train \
  --dataset.repo_id=aswinkumar99/task2-all \
  --dataset.root=/home/riftuser/datasets_combined/aswinkumar99/task2-all \
  --dataset.video_backend=torchcodec \
  --output_dir=/home/riftuser/outputs_matrix/multi_task_dit/task2-all_bs128_s30000 \
  --job_name=multi_task_dit_task2-all_bs128 \
  --batch_size=128 \
  --steps=30000 \
  --log_freq=200 \
  --save_freq=5000 \
  --save_checkpoint=true \
  --num_workers=8 \
  --wandb.enable=true \
  --wandb.project=so101-layout-generalization \
  --wandb.mode=online \
  --wandb.disable_artifact=true \
  --policy.type=multi_task_dit \
  --policy.device=cuda \
  --policy.push_to_hub=false \
  --policy.use_amp=true \
  --policy.horizon=32 \
  --policy.n_action_steps=24 \
  --policy.n_obs_steps=2 \
  --policy.num_layers=4 \
  --policy.hidden_dim=512 \
  --policy.num_heads=8 \
  --policy.dropout=0.1 \
  --policy.timestep_embed_dim=256 \
  --policy.use_rope=true \
  --policy.use_positional_encoding=false \
  --policy.objective=diffusion \
  --policy.noise_scheduler_type=DDPM \
  --policy.num_train_timesteps=100 \
  --policy.optimizer_lr=2e-5 \
  --policy.vision_encoder_lr_multiplier=0.1 \
  --policy.vision_encoder_name=openai/clip-vit-base-patch16 \
  --policy.text_encoder_name=openai/clip-vit-base-patch16 \
  --policy.image_crop_shape=[224,224] \
  --policy.image_crop_is_random=true
```

## Repository Contents

- `pretrained_model/`: final downloadable model artifacts for inference/loading
- `training_state/`: optimizer, RNG, scheduler/state, and step information for resuming or auditability

## Creator

Aswinkumar

- Website: [aswinkumar.me](https://aswinkumar.me)
- Hugging Face repo: <https://huggingface.co/aswinkumar99/LeRobot-SO101-MultiTaskDiT-task2-all_bs128_s30000>