Junyi42 commited on
Commit
728c8ff
·
verified ·
1 Parent(s): e7f6e5b

Upload checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins

Browse files
checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/wandb/offline-run-20260129_223530-checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins-run0/files/config.yaml CHANGED
@@ -0,0 +1,459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _wandb:
4
+ desc: null
5
+ value:
6
+ python_version: 3.11.10
7
+ cli_version: 0.23.1
8
+ framework: huggingface
9
+ huggingface_version: 4.49.0
10
+ is_jupyter_run: false
11
+ is_kaggle_kernel: false
12
+ start_time: 1769726130
13
+ t:
14
+ 1:
15
+ - 1
16
+ - 5
17
+ - 11
18
+ - 41
19
+ - 49
20
+ - 53
21
+ - 71
22
+ - 105
23
+ 2:
24
+ - 1
25
+ - 5
26
+ - 11
27
+ - 41
28
+ - 49
29
+ - 53
30
+ - 71
31
+ - 105
32
+ 3:
33
+ - 2
34
+ - 4
35
+ - 13
36
+ - 14
37
+ - 37
38
+ - 42
39
+ - 61
40
+ 4: 3.11.10
41
+ 5: 0.23.1
42
+ 6: 4.49.0
43
+ 13: linux-x86_64
44
+ e:
45
+ 41atl41yujl4woi37l6lec2nms9e2b17:
46
+ os: Linux-6.6.93+-x86_64-with-glibc2.35
47
+ python: CPython 3.11.10
48
+ started_at: '2026-01-29T22:35:30.341376Z'
49
+ args:
50
+ - --dataset_config_file
51
+ - ./data/configs/vlm_gym_patch_reassembly_alt_train_celoss_no_mse.yaml
52
+ - --eval_dataset_config_file
53
+ - ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss_no_mse.yaml
54
+ - --viz_dataset_config_file
55
+ - ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss_no_mse.yaml
56
+ - --train_data_dir
57
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
58
+ - --train_jsonl_path
59
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
60
+ - --eval_data_dir
61
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
62
+ - --eval_jsonl_path
63
+ - /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
64
+ - --inference_hash_file
65
+ - /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
66
+ - --task_name
67
+ - patch_reassembly_v5
68
+ - --instructions_dir
69
+ - ./data/instructions
70
+ - --model_path
71
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
72
+ - --layer_module
73
+ - Qwen2MoTDecoderLayer
74
+ - --max_latent_size
75
+ - '64'
76
+ - --resume-from
77
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
78
+ - --finetune_from_hf
79
+ - 'True'
80
+ - --auto_resume
81
+ - 'False'
82
+ - --resume-model-only
83
+ - 'True'
84
+ - --finetune-from-ema
85
+ - 'True'
86
+ - --log_every
87
+ - '1'
88
+ - --lr
89
+ - 2e-5
90
+ - --warmup_steps
91
+ - '300'
92
+ - --lr_scheduler
93
+ - cosine
94
+ - --num_worker
95
+ - '1'
96
+ - --expected_num_tokens
97
+ - '30000'
98
+ - --max_num_tokens
99
+ - '30000'
100
+ - --max_num_tokens_per_sample
101
+ - '30000'
102
+ - --visual_und
103
+ - 'True'
104
+ - --visual_gen
105
+ - 'False'
106
+ - --save_every
107
+ - '5000'
108
+ - --total_steps
109
+ - '5000'
110
+ - --text_cond_dropout_prob
111
+ - '0.0'
112
+ - --vae_cond_dropout_prob
113
+ - '0.3'
114
+ - --vit_cond_dropout_prob
115
+ - '0.0'
116
+ - --ema
117
+ - '0.993'
118
+ - --checkpoint_dir
119
+ - /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
120
+ - --wandb_project
121
+ - bagel
122
+ - --wandb_name
123
+ - checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
124
+ - --wandb_dir
125
+ - /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
126
+ - --wandb_offline
127
+ - 'True'
128
+ program: /home/clouduser/Code/Github/unified_world_model/train/pretrain_unified_navit.py
129
+ code_path: train/pretrain_unified_navit.py
130
+ code_path_local: train/pretrain_unified_navit.py
131
+ git:
132
+ remote_url: https://github.com/para-lost/unified_world_model
133
+ commit: 8d7b26b7e552fc87b592cf3be94d93be7aeca2a9
134
+ root: /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
135
+ host: junyizhang-launch-new-226785861-1-0
136
+ executable: /opt/conda/bin/python3.11
137
+ cpu_count: 48
138
+ cpu_count_logical: 96
139
+ gpu_type: NVIDIA A100-SXM4-80GB
140
+ gpu_count: 8
141
+ disk:
142
+ /:
143
+ total: '1052461830144'
144
+ used: '179559157760'
145
+ memory:
146
+ total: '1437332606976'
147
+ gpu_nvidia:
148
+ - name: NVIDIA A100-SXM4-80GB
149
+ memory_total: '85899345920'
150
+ cuda_cores: 6912
151
+ architecture: Ampere
152
+ uuid: GPU-3642efcc-7fb7-ded1-d109-019ec57acba9
153
+ - name: NVIDIA A100-SXM4-80GB
154
+ memory_total: '85899345920'
155
+ cuda_cores: 6912
156
+ architecture: Ampere
157
+ uuid: GPU-286f92a3-0896-9985-124b-9672d14e0965
158
+ - name: NVIDIA A100-SXM4-80GB
159
+ memory_total: '85899345920'
160
+ cuda_cores: 6912
161
+ architecture: Ampere
162
+ uuid: GPU-b8d1660d-433b-a353-3354-3ced2e3bdc2d
163
+ - name: NVIDIA A100-SXM4-80GB
164
+ memory_total: '85899345920'
165
+ cuda_cores: 6912
166
+ architecture: Ampere
167
+ uuid: GPU-36c0b70a-9144-22a4-2630-78914111d46d
168
+ - name: NVIDIA A100-SXM4-80GB
169
+ memory_total: '85899345920'
170
+ cuda_cores: 6912
171
+ architecture: Ampere
172
+ uuid: GPU-719028b7-4b98-69bb-e5f4-00683a4622a2
173
+ - name: NVIDIA A100-SXM4-80GB
174
+ memory_total: '85899345920'
175
+ cuda_cores: 6912
176
+ architecture: Ampere
177
+ uuid: GPU-6d7ed3da-1f64-9da8-8994-6975d400f9ea
178
+ - name: NVIDIA A100-SXM4-80GB
179
+ memory_total: '85899345920'
180
+ cuda_cores: 6912
181
+ architecture: Ampere
182
+ uuid: GPU-944ff8cb-f12e-5497-05dd-4040ea16d966
183
+ - name: NVIDIA A100-SXM4-80GB
184
+ memory_total: '85899345920'
185
+ cuda_cores: 6912
186
+ architecture: Ampere
187
+ uuid: GPU-25010f5d-aa28-64f3-fca5-351cc980469a
188
+ cuda_version: '12.2'
189
+ writer_id: 41atl41yujl4woi37l6lec2nms9e2b17
190
+ visual_gen:
191
+ desc: null
192
+ value: false
193
+ visual_und:
194
+ desc: null
195
+ value: true
196
+ results_dir:
197
+ desc: null
198
+ value: results
199
+ checkpoint_dir:
200
+ desc: null
201
+ value: /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
202
+ wandb_project:
203
+ desc: null
204
+ value: bagel
205
+ wandb_name:
206
+ desc: null
207
+ value: checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
208
+ wandb_runid:
209
+ desc: null
210
+ value: '0'
211
+ wandb_resume:
212
+ desc: null
213
+ value: allow
214
+ wandb_offline:
215
+ desc: null
216
+ value: true
217
+ wandb_dir:
218
+ desc: null
219
+ value: /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins
220
+ global_seed:
221
+ desc: null
222
+ value: 4396
223
+ auto_resume:
224
+ desc: null
225
+ value: false
226
+ resume_from:
227
+ desc: null
228
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
229
+ resume_model_only:
230
+ desc: null
231
+ value: true
232
+ finetune_from_ema:
233
+ desc: null
234
+ value: true
235
+ finetune_from_hf:
236
+ desc: null
237
+ value: true
238
+ log_every:
239
+ desc: null
240
+ value: 1
241
+ save_every:
242
+ desc: null
243
+ value: 5000
244
+ total_steps:
245
+ desc: null
246
+ value: 5000
247
+ warmup_steps:
248
+ desc: null
249
+ value: 300
250
+ lr_scheduler:
251
+ desc: null
252
+ value: cosine
253
+ lr:
254
+ desc: null
255
+ value: 2.0e-05
256
+ min_lr:
257
+ desc: null
258
+ value: 1.0e-07
259
+ beta1:
260
+ desc: null
261
+ value: 0.9
262
+ beta2:
263
+ desc: null
264
+ value: 0.95
265
+ eps:
266
+ desc: null
267
+ value: 1.0e-15
268
+ ema:
269
+ desc: null
270
+ value: 0.993
271
+ max_grad_norm:
272
+ desc: null
273
+ value: 1.0
274
+ timestep_shift:
275
+ desc: null
276
+ value: 1.0
277
+ mse_weight:
278
+ desc: null
279
+ value: 1.0
280
+ ce_weight:
281
+ desc: null
282
+ value: 1.0
283
+ ce_loss_reweighting:
284
+ desc: null
285
+ value: false
286
+ expected_num_tokens:
287
+ desc: null
288
+ value: 30000
289
+ num_replicate:
290
+ desc: null
291
+ value: 1
292
+ num_shard:
293
+ desc: null
294
+ value: 8
295
+ sharding_strategy:
296
+ desc: null
297
+ value: HYBRID_SHARD
298
+ backward_prefetch:
299
+ desc: null
300
+ value: BACKWARD_PRE
301
+ cpu_offload:
302
+ desc: null
303
+ value: false
304
+ freeze_llm:
305
+ desc: null
306
+ value: false
307
+ freeze_vit:
308
+ desc: null
309
+ value: false
310
+ freeze_vae:
311
+ desc: null
312
+ value: true
313
+ freeze_und:
314
+ desc: null
315
+ value: false
316
+ copy_init_moe:
317
+ desc: null
318
+ value: true
319
+ use_flex:
320
+ desc: null
321
+ value: false
322
+ eval_every:
323
+ desc: null
324
+ value: 500
325
+ num_eval_batches:
326
+ desc: null
327
+ value: 20
328
+ use_ema_for_eval:
329
+ desc: null
330
+ value: true
331
+ eval_log_dir:
332
+ desc: null
333
+ value: null
334
+ eval_run_tag:
335
+ desc: null
336
+ value: ''
337
+ viz_every:
338
+ desc: null
339
+ value: 500
340
+ viz_n:
341
+ desc: null
342
+ value: 8
343
+ viz_outdir:
344
+ desc: null
345
+ value: results/viz
346
+ eval_dataset_config_file:
347
+ desc: null
348
+ value: ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss_no_mse.yaml
349
+ viz_dataset_config_file:
350
+ desc: null
351
+ value: ./data/configs/vlm_gym_patch_reassembly_alt_eval_celoss_no_mse.yaml
352
+ eval_print_n:
353
+ desc: null
354
+ value: 3
355
+ save_ema_only:
356
+ desc: null
357
+ value: true
358
+ save_optimizer:
359
+ desc: null
360
+ value: false
361
+ model_path:
362
+ desc: null
363
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
364
+ llm_path:
365
+ desc: null
366
+ value: hf/Qwen2.5-0.5B-Instruct/
367
+ llm_qk_norm:
368
+ desc: null
369
+ value: true
370
+ tie_word_embeddings:
371
+ desc: null
372
+ value: false
373
+ layer_module:
374
+ desc: null
375
+ value: Qwen2MoTDecoderLayer
376
+ vae_path:
377
+ desc: null
378
+ value: flux/vae/ae.safetensors
379
+ vit_path:
380
+ desc: null
381
+ value: hf/siglip-so400m-14-980-flash-attn2-navit/
382
+ max_latent_size:
383
+ desc: null
384
+ value: 64
385
+ latent_patch_size:
386
+ desc: null
387
+ value: 2
388
+ vit_patch_size:
389
+ desc: null
390
+ value: 14
391
+ vit_max_num_patch_per_side:
392
+ desc: null
393
+ value: 70
394
+ connector_act:
395
+ desc: null
396
+ value: gelu_pytorch_tanh
397
+ interpolate_pos:
398
+ desc: null
399
+ value: false
400
+ vit_select_layer:
401
+ desc: null
402
+ value: -2
403
+ vit_rope:
404
+ desc: null
405
+ value: false
406
+ text_cond_dropout_prob:
407
+ desc: null
408
+ value: 0.0
409
+ vae_cond_dropout_prob:
410
+ desc: null
411
+ value: 0.3
412
+ vit_cond_dropout_prob:
413
+ desc: null
414
+ value: 0.0
415
+ dataset_config_file:
416
+ desc: null
417
+ value: ./data/configs/vlm_gym_patch_reassembly_alt_train_celoss_no_mse.yaml
418
+ train_data_dir:
419
+ desc: null
420
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
421
+ train_jsonl_path:
422
+ desc: null
423
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/train/
424
+ eval_data_dir:
425
+ desc: null
426
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
427
+ eval_jsonl_path:
428
+ desc: null
429
+ value: /home/clouduser/Code/data/gym/patch_reassembly_alt_v5/val/
430
+ inference_hash_file:
431
+ desc: null
432
+ value: /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
433
+ task_name:
434
+ desc: null
435
+ value: patch_reassembly_v5
436
+ instructions_dir:
437
+ desc: null
438
+ value: ./data/instructions
439
+ prefetch_factor:
440
+ desc: null
441
+ value: 2
442
+ num_workers:
443
+ desc: null
444
+ value: 1
445
+ max_num_tokens_per_sample:
446
+ desc: null
447
+ value: 30000
448
+ max_num_tokens:
449
+ desc: null
450
+ value: 30000
451
+ prefer_buffer_before:
452
+ desc: null
453
+ value: 16384
454
+ max_buffer_size:
455
+ desc: null
456
+ value: 50
457
+ data_seed:
458
+ desc: null
459
+ value: 42
checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/wandb/offline-run-20260129_223530-checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins-run0/files/output.log CHANGED
@@ -1,173 +1,3 @@
1
- FullyShardedDataParallel(
2
- (_fsdp_wrapped_module): Bagel(
3
- (language_model): Qwen2ForCausalLM(
4
- (model): Qwen2Model(
5
- (embed_tokens): Embedding(152064, 3584)
6
- (layers): ModuleList(
7
- (0-27): 28 x FullyShardedDataParallel(
8
- (_fsdp_wrapped_module): CheckpointWrapper(
9
- (_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
10
- (self_attn): PackedAttentionMoT(
11
- (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
12
- (k_proj): Linear(in_features=3584, out_features=512, bias=True)
13
- (v_proj): Linear(in_features=3584, out_features=512, bias=True)
14
- (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
15
- (q_norm): Qwen2RMSNorm((128,), eps=1e-06)
16
- (k_norm): Qwen2RMSNorm((128,), eps=1e-06)
17
- (q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
18
- (k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
19
- (q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
20
- (k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
21
- (v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
22
- (o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
23
- )
24
- (mlp): Qwen2MLP(
25
- (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
26
- (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
27
- (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
28
- (act_fn): SiLU()
29
- )
30
- (mlp_moe_gen): Qwen2MLP(
31
- (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
32
- (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
33
- (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
34
- (act_fn): SiLU()
35
- )
36
- (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
37
- (input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
38
- (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
39
- (post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
40
- )
41
- )
42
- )
43
- )
44
- (norm): Qwen2RMSNorm((3584,), eps=1e-06)
45
- (norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
46
- (rotary_emb): Qwen2RotaryEmbedding()
47
- )
48
- (lm_head): Linear(in_features=3584, out_features=152064, bias=False)
49
- )
50
- (vit_model): SiglipVisionModel(
51
- (vision_model): FullyShardedDataParallel(
52
- (_fsdp_wrapped_module): SiglipVisionTransformer(
53
- (embeddings): SiglipVisionEmbeddings(
54
- (position_embedding): Embedding(4900, 1152)
55
- (patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
56
- )
57
- (encoder): SiglipEncoder(
58
- (layers): ModuleList(
59
- (0-25): 26 x FullyShardedDataParallel(
60
- (_fsdp_wrapped_module): CheckpointWrapper(
61
- (_checkpoint_wrapped_module): SiglipEncoderLayer(
62
- (self_attn): SiglipFlashAttention2(
63
- (k_proj): Linear(in_features=1152, out_features=1152, bias=True)
64
- (v_proj): Linear(in_features=1152, out_features=1152, bias=True)
65
- (q_proj): Linear(in_features=1152, out_features=1152, bias=True)
66
- (out_proj): Linear(in_features=1152, out_features=1152, bias=True)
67
- )
68
- (layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
69
- (mlp): SiglipMLP(
70
- (activation_fn): PytorchGELUTanh()
71
- (fc1): Linear(in_features=1152, out_features=4304, bias=True)
72
- (fc2): Linear(in_features=4304, out_features=1152, bias=True)
73
- )
74
- (layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
75
- )
76
- )
77
- )
78
- )
79
- )
80
- (post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
81
- )
82
- )
83
- )
84
- (connector): FullyShardedDataParallel(
85
- (_fsdp_wrapped_module): CheckpointWrapper(
86
- (_checkpoint_wrapped_module): MLPconnector(
87
- (activation_fn): PytorchGELUTanh()
88
- (fc1): Linear(in_features=1152, out_features=3584, bias=True)
89
- (fc2): Linear(in_features=3584, out_features=3584, bias=True)
90
- )
91
- )
92
- )
93
- (vit_pos_embed): FullyShardedDataParallel(
94
- (_fsdp_wrapped_module): PositionEmbedding()
95
- )
96
- )
97
- )
98
- _flat_param True
99
- language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
100
- language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
101
- language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
102
- language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
103
- language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
104
- language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
105
- language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
106
- language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
107
- language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
108
- language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
109
- language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
110
- language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
111
- language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
112
- language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
113
- language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
114
- language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
115
- language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
116
- language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
117
- language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
118
- language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
119
- language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
120
- language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
121
- language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
122
- language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
123
- language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
124
- language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
125
- language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
126
- language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
127
- vit_model.vision_model._fsdp_wrapped_module._flat_param True
128
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
129
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
130
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
131
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
132
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
133
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
134
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
135
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
136
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
137
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
138
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
139
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
140
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
141
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
142
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
143
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
144
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
145
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
146
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
147
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
148
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
149
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
150
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
151
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
152
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
153
- vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
154
- connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
155
- vit_pos_embed._fsdp_wrapped_module._flat_param False
156
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse/vlm_gym_patch_reassembly_alt_train
157
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step0
158
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
159
- [eval debug] first 3 batch fingerprints:
160
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
161
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
162
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
163
- ce_avg: 0.49487826228141785, mse_avg: 0.0
164
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step500
165
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
166
- [eval debug] first 3 batch fingerprints:
167
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
168
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
169
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
170
- ce_avg: 0.0996677577495575, mse_avg: 0.0
171
  wandb: Detected [huggingface_hub.inference] in use.
172
  wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
173
  wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
@@ -1166,6 +996,197 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
1166
  [2026-01-29 23:17:49] (step=0000985) Train Loss mse: 0.0000, Train Loss ce: 0.0878, Train Steps/Sec: 0.41,
1167
  [2026-01-29 23:17:52] (step=0000986) Train Loss mse: 0.0000, Train Loss ce: 0.0956, Train Steps/Sec: 0.45,
1168
  [2026-01-29 23:17:54] (step=0000987) Train Loss mse: 0.0000, Train Loss ce: 0.0941, Train Steps/Sec: 0.41,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1169
  [2026-01-29 23:17:56] (step=0000988) Train Loss mse: 0.0000, Train Loss ce: 0.0884, Train Steps/Sec: 0.51,
1170
  [2026-01-29 23:17:58] (step=0000989) Train Loss mse: 0.0000, Train Loss ce: 0.1141, Train Steps/Sec: 0.47,
1171
  [2026-01-29 23:18:00] (step=0000990) Train Loss mse: 0.0000, Train Loss ce: 0.0825, Train Steps/Sec: 0.61,
@@ -1285,27 +1306,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
1285
  [2026-01-29 23:22:09] (step=0001104) Train Loss mse: 0.0000, Train Loss ce: 0.0848, Train Steps/Sec: 0.43,
1286
  [2026-01-29 23:22:11] (step=0001105) Train Loss mse: 0.0000, Train Loss ce: 0.0843, Train Steps/Sec: 0.40,
1287
  [2026-01-29 23:22:14] (step=0001106) Train Loss mse: 0.0000, Train Loss ce: 0.1029, Train Steps/Sec: 0.41,
1288
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step1000
1289
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1290
- [eval debug] first 3 batch fingerprints:
1291
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1292
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1293
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1294
- ce_avg: 0.09581880271434784, mse_avg: 0.0
1295
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step1500
1296
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1297
- [eval debug] first 3 batch fingerprints:
1298
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1299
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1300
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1301
- ce_avg: 0.11242858320474625, mse_avg: 0.0
1302
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step2000
1303
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1304
- [eval debug] first 3 batch fingerprints:
1305
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1306
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1307
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1308
- ce_avg: 0.10297781229019165, mse_avg: 0.0
1309
  [2026-01-29 23:22:16] (step=0001107) Train Loss mse: 0.0000, Train Loss ce: 0.1083, Train Steps/Sec: 0.47,
1310
  [2026-01-29 23:22:18] (step=0001108) Train Loss mse: 0.0000, Train Loss ce: 0.0912, Train Steps/Sec: 0.45,
1311
  [2026-01-29 23:22:20] (step=0001109) Train Loss mse: 0.0000, Train Loss ce: 0.0928, Train Steps/Sec: 0.50,
@@ -2663,6 +2663,27 @@ ce_avg: 0.10297781229019165, mse_avg: 0.0
2663
  [2026-01-30 00:11:24] (step=0002461) Train Loss mse: 0.0000, Train Loss ce: 0.0646, Train Steps/Sec: 0.45,
2664
  [2026-01-30 00:11:26] (step=0002462) Train Loss mse: 0.0000, Train Loss ce: 0.0767, Train Steps/Sec: 0.45,
2665
  [2026-01-30 00:11:28] (step=0002463) Train Loss mse: 0.0000, Train Loss ce: 0.0795, Train Steps/Sec: 0.48,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2666
  [2026-01-30 00:11:30] (step=0002464) Train Loss mse: 0.0000, Train Loss ce: 0.0584, Train Steps/Sec: 0.43,
2667
  [2026-01-30 00:11:32] (step=0002465) Train Loss mse: 0.0000, Train Loss ce: 0.0864, Train Steps/Sec: 0.52,
2668
  [2026-01-30 00:11:34] (step=0002466) Train Loss mse: 0.0000, Train Loss ce: 0.0867, Train Steps/Sec: 0.49,
@@ -2841,27 +2862,6 @@ ce_avg: 0.10297781229019165, mse_avg: 0.0
2841
  [2026-01-30 00:17:48] (step=0002639) Train Loss mse: 0.0000, Train Loss ce: 0.0904, Train Steps/Sec: 0.45,
2842
  [2026-01-30 00:17:50] (step=0002640) Train Loss mse: 0.0000, Train Loss ce: 0.0875, Train Steps/Sec: 0.49,
2843
  [2026-01-30 00:17:52] (step=0002641) Train Loss mse: 0.0000, Train Loss ce: 0.0868, Train Steps/Sec: 0.41,
2844
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step2500
2845
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
2846
- [eval debug] first 3 batch fingerprints:
2847
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2848
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2849
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2850
- ce_avg: 0.12932956218719482, mse_avg: 0.0
2851
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step3000
2852
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
2853
- [eval debug] first 3 batch fingerprints:
2854
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2855
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2856
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2857
- ce_avg: 0.12472175806760788, mse_avg: 0.0
2858
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step3500
2859
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
2860
- [eval debug] first 3 batch fingerprints:
2861
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2862
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2863
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2864
- ce_avg: 0.13290119171142578, mse_avg: 0.0
2865
  [2026-01-30 00:17:54] (step=0002642) Train Loss mse: 0.0000, Train Loss ce: 0.0942, Train Steps/Sec: 0.51,
2866
  [2026-01-30 00:17:56] (step=0002643) Train Loss mse: 0.0000, Train Loss ce: 0.0798, Train Steps/Sec: 0.52,
2867
  [2026-01-30 00:17:58] (step=0002644) Train Loss mse: 0.0000, Train Loss ce: 0.0646, Train Steps/Sec: 0.52,
@@ -3715,32 +3715,21 @@ ce_avg: 0.13290119171142578, mse_avg: 0.0
3715
  [2026-01-30 00:48:47] (step=0003492) Train Loss mse: 0.0000, Train Loss ce: 0.0804, Train Steps/Sec: 0.43,
3716
  [2026-01-30 00:48:49] (step=0003493) Train Loss mse: 0.0000, Train Loss ce: 0.0695, Train Steps/Sec: 0.46,
3717
  [2026-01-30 00:48:52] (step=0003494) Train Loss mse: 0.0000, Train Loss ce: 0.0633, Train Steps/Sec: 0.39,
3718
- [2026-01-30 00:48:54] (step=0003495) Train Loss mse: 0.0000, Train Loss ce: 0.0751, Train Steps/Sec: 0.41,
3719
- [2026-01-30 00:48:56] (step=0003496) Train Loss mse: 0.0000, Train Loss ce: 0.0818, Train Steps/Sec: 0.49,
3720
- [2026-01-30 00:48:58] (step=0003497) Train Loss mse: 0.0000, Train Loss ce: 0.0806, Train Steps/Sec: 0.47,
3721
- [2026-01-30 00:49:00] (step=0003498) Train Loss mse: 0.0000, Train Loss ce: 0.0663, Train Steps/Sec: 0.53,
3722
- [2026-01-30 00:49:02] (step=0003499) Train Loss mse: 0.0000, Train Loss ce: 0.0822, Train Steps/Sec: 0.49,
3723
- [2026-01-30 00:49:12] (step=0003500) Train Loss mse: 0.0000, Train Loss ce: 0.0845, Train Steps/Sec: 0.10,
3724
- [2026-01-30 00:49:14] (step=0003501) Train Loss mse: 0.0000, Train Loss ce: 0.0789, Train Steps/Sec: 0.51,
3725
- [2026-01-30 00:49:16] (step=0003502) Train Loss mse: 0.0000, Train Loss ce: 0.0711, Train Steps/Sec: 0.51,
3726
- [2026-01-30 00:49:18] (step=0003503) Train Loss mse: 0.0000, Train Loss ce: 0.0561, Train Steps/Sec: 0.51,
3727
- [2026-01-30 00:49:20] (step=0003504) Train Loss mse: 0.0000, Train Loss ce: 0.0610, Train Steps/Sec: 0.50,
3728
- [2026-01-30 00:49:22] (step=0003505) Train Loss mse: 0.0000, Train Loss ce: 0.0814, Train Steps/Sec: 0.49,
3729
- [2026-01-30 00:49:24] (step=0003506) Train Loss mse: 0.0000, Train Loss ce: 0.0686, Train Steps/Sec: 0.45,
3730
- [2026-01-30 00:49:26] (step=0003507) Train Loss mse: 0.0000, Train Loss ce: 0.0735, Train Steps/Sec: 0.44,
3731
- [2026-01-30 00:49:28] (step=0003508) Train Loss mse: 0.0000, Train Loss ce: 0.0664, Train Steps/Sec: 0.54,
3732
- [2026-01-30 00:49:30] (step=0003509) Train Loss mse: 0.0000, Train Loss ce: 0.0686, Train Steps/Sec: 0.53,
3733
- [2026-01-30 00:49:32] (step=0003510) Train Loss mse: 0.0000, Train Loss ce: 0.0632, Train Steps/Sec: 0.51,
3734
- [2026-01-30 00:49:34] (step=0003511) Train Loss mse: 0.0000, Train Loss ce: 0.0723, Train Steps/Sec: 0.46,
3735
- [2026-01-30 00:49:36] (step=0003512) Train Loss mse: 0.0000, Train Loss ce: 0.0726, Train Steps/Sec: 0.49,
3736
- [2026-01-30 00:49:38] (step=0003513) Train Loss mse: 0.0000, Train Loss ce: 0.0715, Train Steps/Sec: 0.51,
3737
- [2026-01-30 00:49:41] (step=0003514) Train Loss mse: 0.0000, Train Loss ce: 0.0767, Train Steps/Sec: 0.35,
3738
- [2026-01-30 00:49:43] (step=0003515) Train Loss mse: 0.0000, Train Loss ce: 0.0762, Train Steps/Sec: 0.50,
3739
- [2026-01-30 00:49:45] (step=0003516) Train Loss mse: 0.0000, Train Loss ce: 0.0615, Train Steps/Sec: 0.47,
3740
- [2026-01-30 00:49:47] (step=0003517) Train Loss mse: 0.0000, Train Loss ce: 0.0745, Train Steps/Sec: 0.52,
3741
- [2026-01-30 00:49:49] (step=0003518) Train Loss mse: 0.0000, Train Loss ce: 0.0668, Train Steps/Sec: 0.49,
3742
- [2026-01-30 00:49:51] (step=0003519) Train Loss mse: 0.0000, Train Loss ce: 0.0611, Train Steps/Sec: 0.51,
3743
- [2026-01-30 00:49:53] (step=0003520) Train Loss mse: 0.0000, Train Loss ce: 0.0626, Train Steps/Sec: 0.51,
3744
  [2026-01-30 00:49:56] (step=0003521) Train Loss mse: 0.0000, Train Loss ce: 0.0639, Train Steps/Sec: 0.41,
3745
  [2026-01-30 00:49:57] (step=0003522) Train Loss mse: 0.0000, Train Loss ce: 0.0738, Train Steps/Sec: 0.56,
3746
  [2026-01-30 00:50:00] (step=0003523) Train Loss mse: 0.0000, Train Loss ce: 0.0628, Train Steps/Sec: 0.44,
@@ -3965,27 +3954,6 @@ ce_avg: 0.13290119171142578, mse_avg: 0.0
3965
  [2026-01-30 00:57:48] (step=0003742) Train Loss mse: 0.0000, Train Loss ce: 0.0685, Train Steps/Sec: 0.43,
3966
  [2026-01-30 00:57:50] (step=0003743) Train Loss mse: 0.0000, Train Loss ce: 0.0728, Train Steps/Sec: 0.49,
3967
  [2026-01-30 00:57:52] (step=0003744) Train Loss mse: 0.0000, Train Loss ce: 0.0693, Train Steps/Sec: 0.46,
3968
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step4000
3969
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
3970
- [eval debug] first 3 batch fingerprints:
3971
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3972
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3973
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3974
- ce_avg: 0.13901878893375397, mse_avg: 0.0
3975
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step4500
3976
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
3977
- [eval debug] first 3 batch fingerprints:
3978
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3979
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3980
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3981
- ce_avg: 0.1414453536272049, mse_avg: 0.0
3982
- base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step5000
3983
- Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
3984
- [eval debug] first 3 batch fingerprints:
3985
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3986
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3987
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3988
- ce_avg: 0.13837039470672607, mse_avg: 0.0
3989
  [2026-01-30 00:57:55] (step=0003745) Train Loss mse: 0.0000, Train Loss ce: 0.0776, Train Steps/Sec: 0.47,
3990
  [2026-01-30 00:57:57] (step=0003746) Train Loss mse: 0.0000, Train Loss ce: 0.0664, Train Steps/Sec: 0.47,
3991
  [2026-01-30 00:57:59] (step=0003747) Train Loss mse: 0.0000, Train Loss ce: 0.0767, Train Steps/Sec: 0.44,
@@ -5245,4 +5213,11 @@ ce_avg: 0.13837039470672607, mse_avg: 0.0
5245
  [2026-01-30 01:43:22] Saving checkpoint to /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/0005000.
5246
  /opt/conda/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
5247
  warnings.warn(
5248
- [2026-01-30 01:45:52] Done!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  wandb: Detected [huggingface_hub.inference] in use.
2
  wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
3
  wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
 
996
  [2026-01-29 23:17:49] (step=0000985) Train Loss mse: 0.0000, Train Loss ce: 0.0878, Train Steps/Sec: 0.41,
997
  [2026-01-29 23:17:52] (step=0000986) Train Loss mse: 0.0000, Train Loss ce: 0.0956, Train Steps/Sec: 0.45,
998
  [2026-01-29 23:17:54] (step=0000987) Train Loss mse: 0.0000, Train Loss ce: 0.0941, Train Steps/Sec: 0.41,
999
+ FullyShardedDataParallel(
1000
+ (_fsdp_wrapped_module): Bagel(
1001
+ (language_model): Qwen2ForCausalLM(
1002
+ (model): Qwen2Model(
1003
+ (embed_tokens): Embedding(152064, 3584)
1004
+ (layers): ModuleList(
1005
+ (0-27): 28 x FullyShardedDataParallel(
1006
+ (_fsdp_wrapped_module): CheckpointWrapper(
1007
+ (_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
1008
+ (self_attn): PackedAttentionMoT(
1009
+ (q_proj): Linear(in_features=3584, out_features=3584, bias=True)
1010
+ (k_proj): Linear(in_features=3584, out_features=512, bias=True)
1011
+ (v_proj): Linear(in_features=3584, out_features=512, bias=True)
1012
+ (o_proj): Linear(in_features=3584, out_features=3584, bias=False)
1013
+ (q_norm): Qwen2RMSNorm((128,), eps=1e-06)
1014
+ (k_norm): Qwen2RMSNorm((128,), eps=1e-06)
1015
+ (q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
1016
+ (k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
1017
+ (q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
1018
+ (k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
1019
+ (v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
1020
+ (o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
1021
+ )
1022
+ (mlp): Qwen2MLP(
1023
+ (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
1024
+ (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
1025
+ (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
1026
+ (act_fn): SiLU()
1027
+ )
1028
+ (mlp_moe_gen): Qwen2MLP(
1029
+ (gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
1030
+ (up_proj): Linear(in_features=3584, out_features=18944, bias=False)
1031
+ (down_proj): Linear(in_features=18944, out_features=3584, bias=False)
1032
+ (act_fn): SiLU()
1033
+ )
1034
+ (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
1035
+ (input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
1036
+ (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
1037
+ (post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
1038
+ )
1039
+ )
1040
+ )
1041
+ )
1042
+ (norm): Qwen2RMSNorm((3584,), eps=1e-06)
1043
+ (norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
1044
+ (rotary_emb): Qwen2RotaryEmbedding()
1045
+ )
1046
+ (lm_head): Linear(in_features=3584, out_features=152064, bias=False)
1047
+ )
1048
+ (vit_model): SiglipVisionModel(
1049
+ (vision_model): FullyShardedDataParallel(
1050
+ (_fsdp_wrapped_module): SiglipVisionTransformer(
1051
+ (embeddings): SiglipVisionEmbeddings(
1052
+ (position_embedding): Embedding(4900, 1152)
1053
+ (patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
1054
+ )
1055
+ (encoder): SiglipEncoder(
1056
+ (layers): ModuleList(
1057
+ (0-25): 26 x FullyShardedDataParallel(
1058
+ (_fsdp_wrapped_module): CheckpointWrapper(
1059
+ (_checkpoint_wrapped_module): SiglipEncoderLayer(
1060
+ (self_attn): SiglipFlashAttention2(
1061
+ (k_proj): Linear(in_features=1152, out_features=1152, bias=True)
1062
+ (v_proj): Linear(in_features=1152, out_features=1152, bias=True)
1063
+ (q_proj): Linear(in_features=1152, out_features=1152, bias=True)
1064
+ (out_proj): Linear(in_features=1152, out_features=1152, bias=True)
1065
+ )
1066
+ (layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
1067
+ (mlp): SiglipMLP(
1068
+ (activation_fn): PytorchGELUTanh()
1069
+ (fc1): Linear(in_features=1152, out_features=4304, bias=True)
1070
+ (fc2): Linear(in_features=4304, out_features=1152, bias=True)
1071
+ )
1072
+ (layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
1073
+ )
1074
+ )
1075
+ )
1076
+ )
1077
+ )
1078
+ (post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
1079
+ )
1080
+ )
1081
+ )
1082
+ (connector): FullyShardedDataParallel(
1083
+ (_fsdp_wrapped_module): CheckpointWrapper(
1084
+ (_checkpoint_wrapped_module): MLPconnector(
1085
+ (activation_fn): PytorchGELUTanh()
1086
+ (fc1): Linear(in_features=1152, out_features=3584, bias=True)
1087
+ (fc2): Linear(in_features=3584, out_features=3584, bias=True)
1088
+ )
1089
+ )
1090
+ )
1091
+ (vit_pos_embed): FullyShardedDataParallel(
1092
+ (_fsdp_wrapped_module): PositionEmbedding()
1093
+ )
1094
+ )
1095
+ )
1096
+ _flat_param True
1097
+ language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1098
+ language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1099
+ language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1100
+ language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1101
+ language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1102
+ language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1103
+ language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1104
+ language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1105
+ language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1106
+ language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1107
+ language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1108
+ language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1109
+ language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1110
+ language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1111
+ language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1112
+ language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1113
+ language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1114
+ language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1115
+ language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1116
+ language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1117
+ language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1118
+ language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1119
+ language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1120
+ language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1121
+ language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1122
+ language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1123
+ language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1124
+ language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1125
+ vit_model.vision_model._fsdp_wrapped_module._flat_param True
1126
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1127
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1128
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1129
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1130
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1131
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1132
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1133
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1134
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1135
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1136
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1137
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1138
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1139
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1140
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1141
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1142
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1143
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1144
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1145
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1146
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1147
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1148
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1149
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1150
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1151
+ vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1152
+ connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
1153
+ vit_pos_embed._fsdp_wrapped_module._flat_param False
1154
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse/vlm_gym_patch_reassembly_alt_train
1155
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step0
1156
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1157
+ [eval debug] first 3 batch fingerprints:
1158
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1159
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1160
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1161
+ ce_avg: 0.49487826228141785, mse_avg: 0.0
1162
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step500
1163
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1164
+ [eval debug] first 3 batch fingerprints:
1165
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1166
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1167
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1168
+ ce_avg: 0.0996677577495575, mse_avg: 0.0
1169
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step1000
1170
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1171
+ [eval debug] first 3 batch fingerprints:
1172
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1173
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1174
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1175
+ ce_avg: 0.09581880271434784, mse_avg: 0.0
1176
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step1500
1177
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1178
+ [eval debug] first 3 batch fingerprints:
1179
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1180
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1181
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1182
+ ce_avg: 0.11242858320474625, mse_avg: 0.0
1183
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step2000
1184
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
1185
+ [eval debug] first 3 batch fingerprints:
1186
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1187
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1188
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
1189
+ ce_avg: 0.10297781229019165, mse_avg: 0.0
1190
  [2026-01-29 23:17:56] (step=0000988) Train Loss mse: 0.0000, Train Loss ce: 0.0884, Train Steps/Sec: 0.51,
1191
  [2026-01-29 23:17:58] (step=0000989) Train Loss mse: 0.0000, Train Loss ce: 0.1141, Train Steps/Sec: 0.47,
1192
  [2026-01-29 23:18:00] (step=0000990) Train Loss mse: 0.0000, Train Loss ce: 0.0825, Train Steps/Sec: 0.61,
 
1306
  [2026-01-29 23:22:09] (step=0001104) Train Loss mse: 0.0000, Train Loss ce: 0.0848, Train Steps/Sec: 0.43,
1307
  [2026-01-29 23:22:11] (step=0001105) Train Loss mse: 0.0000, Train Loss ce: 0.0843, Train Steps/Sec: 0.40,
1308
  [2026-01-29 23:22:14] (step=0001106) Train Loss mse: 0.0000, Train Loss ce: 0.1029, Train Steps/Sec: 0.41,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1309
  [2026-01-29 23:22:16] (step=0001107) Train Loss mse: 0.0000, Train Loss ce: 0.1083, Train Steps/Sec: 0.47,
1310
  [2026-01-29 23:22:18] (step=0001108) Train Loss mse: 0.0000, Train Loss ce: 0.0912, Train Steps/Sec: 0.45,
1311
  [2026-01-29 23:22:20] (step=0001109) Train Loss mse: 0.0000, Train Loss ce: 0.0928, Train Steps/Sec: 0.50,
 
2663
  [2026-01-30 00:11:24] (step=0002461) Train Loss mse: 0.0000, Train Loss ce: 0.0646, Train Steps/Sec: 0.45,
2664
  [2026-01-30 00:11:26] (step=0002462) Train Loss mse: 0.0000, Train Loss ce: 0.0767, Train Steps/Sec: 0.45,
2665
  [2026-01-30 00:11:28] (step=0002463) Train Loss mse: 0.0000, Train Loss ce: 0.0795, Train Steps/Sec: 0.48,
2666
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step2500
2667
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
2668
+ [eval debug] first 3 batch fingerprints:
2669
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2670
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2671
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2672
+ ce_avg: 0.12932956218719482, mse_avg: 0.0
2673
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step3000
2674
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
2675
+ [eval debug] first 3 batch fingerprints:
2676
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2677
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2678
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2679
+ ce_avg: 0.12472175806760788, mse_avg: 0.0
2680
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step3500
2681
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
2682
+ [eval debug] first 3 batch fingerprints:
2683
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2684
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2685
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
2686
+ ce_avg: 0.13290119171142578, mse_avg: 0.0
2687
  [2026-01-30 00:11:30] (step=0002464) Train Loss mse: 0.0000, Train Loss ce: 0.0584, Train Steps/Sec: 0.43,
2688
  [2026-01-30 00:11:32] (step=0002465) Train Loss mse: 0.0000, Train Loss ce: 0.0864, Train Steps/Sec: 0.52,
2689
  [2026-01-30 00:11:34] (step=0002466) Train Loss mse: 0.0000, Train Loss ce: 0.0867, Train Steps/Sec: 0.49,
 
2862
  [2026-01-30 00:17:48] (step=0002639) Train Loss mse: 0.0000, Train Loss ce: 0.0904, Train Steps/Sec: 0.45,
2863
  [2026-01-30 00:17:50] (step=0002640) Train Loss mse: 0.0000, Train Loss ce: 0.0875, Train Steps/Sec: 0.49,
2864
  [2026-01-30 00:17:52] (step=0002641) Train Loss mse: 0.0000, Train Loss ce: 0.0868, Train Steps/Sec: 0.41,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2865
  [2026-01-30 00:17:54] (step=0002642) Train Loss mse: 0.0000, Train Loss ce: 0.0942, Train Steps/Sec: 0.51,
2866
  [2026-01-30 00:17:56] (step=0002643) Train Loss mse: 0.0000, Train Loss ce: 0.0798, Train Steps/Sec: 0.52,
2867
  [2026-01-30 00:17:58] (step=0002644) Train Loss mse: 0.0000, Train Loss ce: 0.0646, Train Steps/Sec: 0.52,
 
3715
  [2026-01-30 00:48:47] (step=0003492) Train Loss mse: 0.0000, Train Loss ce: 0.0804, Train Steps/Sec: 0.43,
3716
  [2026-01-30 00:48:49] (step=0003493) Train Loss mse: 0.0000, Train Loss ce: 0.0695, Train Steps/Sec: 0.46,
3717
  [2026-01-30 00:48:52] (step=0003494) Train Loss mse: 0.0000, Train Loss ce: 0.0633, Train Steps/Sec: 0.39,
3718
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step4000
3719
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
3720
+ [eval debug] first 3 batch fingerprints:
3721
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3722
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3723
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3724
+ ce_avg: 0.13901878893375397, mse_avg: 0.0
3725
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step4500
3726
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
3727
+ [eval debug] first 3 batch fingerprints:
3728
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3729
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3730
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
3731
+ ce_avg: 0.1414453536272049, mse_avg: 0.0
3732
+ ] (step=0003520) Train Loss mse: 0.0000, Train Loss ce: 0.0626, Train Steps/Sec: 0.51,
 
 
 
 
 
 
 
 
 
 
 
3733
  [2026-01-30 00:49:56] (step=0003521) Train Loss mse: 0.0000, Train Loss ce: 0.0639, Train Steps/Sec: 0.41,
3734
  [2026-01-30 00:49:57] (step=0003522) Train Loss mse: 0.0000, Train Loss ce: 0.0738, Train Steps/Sec: 0.56,
3735
  [2026-01-30 00:50:00] (step=0003523) Train Loss mse: 0.0000, Train Loss ce: 0.0628, Train Steps/Sec: 0.44,
 
3954
  [2026-01-30 00:57:48] (step=0003742) Train Loss mse: 0.0000, Train Loss ce: 0.0685, Train Steps/Sec: 0.43,
3955
  [2026-01-30 00:57:50] (step=0003743) Train Loss mse: 0.0000, Train Loss ce: 0.0728, Train Steps/Sec: 0.49,
3956
  [2026-01-30 00:57:52] (step=0003744) Train Loss mse: 0.0000, Train Loss ce: 0.0693, Train Steps/Sec: 0.46,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3957
  [2026-01-30 00:57:55] (step=0003745) Train Loss mse: 0.0000, Train Loss ce: 0.0776, Train Steps/Sec: 0.47,
3958
  [2026-01-30 00:57:57] (step=0003746) Train Loss mse: 0.0000, Train Loss ce: 0.0664, Train Steps/Sec: 0.47,
3959
  [2026-01-30 00:57:59] (step=0003747) Train Loss mse: 0.0000, Train Loss ce: 0.0767, Train Steps/Sec: 0.44,
 
5213
  [2026-01-30 01:43:22] Saving checkpoint to /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/0005000.
5214
  /opt/conda/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
5215
  warnings.warn(
5216
+ [2026-01-30 01:45:52] Done!
5217
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_patch_reassembly_alt_one_image_lr2e_5_ce_no_mse_ins_step5000
5218
+ Preparing Dataset vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce/vlm_gym_patch_reassembly_alt_val
5219
+ [eval debug] first 3 batch fingerprints:
5220
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
5221
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
5222
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_patch_reassembly_alt_celoss_no_mse_evalonce'}]
5223
+ ce_avg: 0.13837039470672607, mse_avg: 0.0