Junyi42 commited on
Commit
b11bca0
·
verified ·
1 Parent(s): ced6c91

Upload checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins

Browse files
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/wandb/offline-run-20260129_223634-vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins-run0/files/config.yaml CHANGED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _wandb:
4
+ desc: null
5
+ value:
6
+ python_version: 3.11.10
7
+ cli_version: 0.23.1
8
+ framework: huggingface
9
+ huggingface_version: 4.49.0
10
+ is_jupyter_run: false
11
+ is_kaggle_kernel: false
12
+ start_time: 1769726194
13
+ t:
14
+ 1:
15
+ - 1
16
+ - 5
17
+ - 11
18
+ - 41
19
+ - 49
20
+ - 53
21
+ - 71
22
+ - 105
23
+ 2:
24
+ - 1
25
+ - 5
26
+ - 11
27
+ - 41
28
+ - 49
29
+ - 53
30
+ - 71
31
+ - 105
32
+ 3:
33
+ - 2
34
+ - 4
35
+ - 13
36
+ - 14
37
+ - 37
38
+ - 42
39
+ - 61
40
+ 4: 3.11.10
41
+ 5: 0.23.1
42
+ 6: 4.49.0
43
+ 13: linux-x86_64
44
+ e:
45
+ 452u6eq6qj9cpdr80e5nue3sk0cejbit:
46
+ os: Linux-6.6.93+-x86_64-with-glibc2.35
47
+ python: CPython 3.11.10
48
+ started_at: '2026-01-29T22:36:34.516284Z'
49
+ args:
50
+ - --dataset_config_file
51
+ - ./data/configs/vlm_gym_reference_dot_train_mseloss_only.yaml
52
+ - --eval_dataset_config_file
53
+ - ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
54
+ - --viz_dataset_config_file
55
+ - ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
56
+ - --inference_hash_file
57
+ - /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
58
+ - --task_name
59
+ - reference_dot_v5
60
+ - --instructions_dir
61
+ - ./data/instructions
62
+ - --train_data_dir
63
+ - /home/clouduser/Code/data/gym/reference_dot_v5/train/
64
+ - --train_jsonl_path
65
+ - /home/clouduser/Code/data/gym/reference_dot_v5/train/
66
+ - --eval_data_dir
67
+ - /home/clouduser/Code/data/gym/reference_dot_v5/val/
68
+ - --eval_jsonl_path
69
+ - /home/clouduser/Code/data/gym/reference_dot_v5/val/
70
+ - --model_path
71
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
72
+ - --layer_module
73
+ - Qwen2MoTDecoderLayer
74
+ - --max_latent_size
75
+ - '64'
76
+ - --resume-from
77
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
78
+ - --finetune_from_hf
79
+ - 'True'
80
+ - --auto_resume
81
+ - 'False'
82
+ - --resume-model-only
83
+ - 'True'
84
+ - --finetune-from-ema
85
+ - 'True'
86
+ - --log_every
87
+ - '1'
88
+ - --lr
89
+ - 2e-5
90
+ - --warmup_steps
91
+ - '300'
92
+ - --lr_scheduler
93
+ - cosine
94
+ - --num_worker
95
+ - '1'
96
+ - --expected_num_tokens
97
+ - '30000'
98
+ - --max_num_tokens
99
+ - '30000'
100
+ - --max_num_tokens_per_sample
101
+ - '30000'
102
+ - --visual_und
103
+ - 'True'
104
+ - --save_every
105
+ - '5000'
106
+ - --total_steps
107
+ - '5000'
108
+ - --text_cond_dropout_prob
109
+ - '0.0'
110
+ - --vae_cond_dropout_prob
111
+ - '0.0'
112
+ - --vit_cond_dropout_prob
113
+ - '0.0'
114
+ - --ema
115
+ - '0.993'
116
+ - --checkpoint_dir
117
+ - /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
118
+ - --wandb_project
119
+ - bagel
120
+ - --wandb_name
121
+ - vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins
122
+ - --wandb_dir
123
+ - /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
124
+ - --wandb_offline
125
+ - 'True'
126
+ program: /home/clouduser/Code/Github/unified_world_model/train/pretrain_unified_navit.py
127
+ code_path: train/pretrain_unified_navit.py
128
+ code_path_local: train/pretrain_unified_navit.py
129
+ git:
130
+ remote_url: https://github.com/para-lost/unified_world_model
131
+ commit: 8d7b26b7e552fc87b592cf3be94d93be7aeca2a9
132
+ root: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
133
+ host: junyizhang-launch-new-226786220-1-0
134
+ executable: /opt/conda/bin/python3.11
135
+ cpu_count: 48
136
+ cpu_count_logical: 96
137
+ gpu_type: NVIDIA A100-SXM4-80GB
138
+ gpu_count: 8
139
+ disk:
140
+ /:
141
+ total: '1052461830144'
142
+ used: '164465111040'
143
+ memory:
144
+ total: '1437332606976'
145
+ gpu_nvidia:
146
+ - name: NVIDIA A100-SXM4-80GB
147
+ memory_total: '85899345920'
148
+ cuda_cores: 6912
149
+ architecture: Ampere
150
+ uuid: GPU-6743ee30-8d44-3700-5053-982b7634dc72
151
+ - name: NVIDIA A100-SXM4-80GB
152
+ memory_total: '85899345920'
153
+ cuda_cores: 6912
154
+ architecture: Ampere
155
+ uuid: GPU-efdba308-be1c-fb5c-98af-5161d08502b8
156
+ - name: NVIDIA A100-SXM4-80GB
157
+ memory_total: '85899345920'
158
+ cuda_cores: 6912
159
+ architecture: Ampere
160
+ uuid: GPU-7a081892-5d05-b817-0e72-25e6d0f10d7a
161
+ - name: NVIDIA A100-SXM4-80GB
162
+ memory_total: '85899345920'
163
+ cuda_cores: 6912
164
+ architecture: Ampere
165
+ uuid: GPU-9be64e16-04d7-3a0f-eac3-d2b19251109b
166
+ - name: NVIDIA A100-SXM4-80GB
167
+ memory_total: '85899345920'
168
+ cuda_cores: 6912
169
+ architecture: Ampere
170
+ uuid: GPU-119e1d42-0ee4-815a-6fbf-d61349b517ef
171
+ - name: NVIDIA A100-SXM4-80GB
172
+ memory_total: '85899345920'
173
+ cuda_cores: 6912
174
+ architecture: Ampere
175
+ uuid: GPU-54053246-0cf6-b199-d31d-12946eb5de68
176
+ - name: NVIDIA A100-SXM4-80GB
177
+ memory_total: '85899345920'
178
+ cuda_cores: 6912
179
+ architecture: Ampere
180
+ uuid: GPU-3e4b216a-3bc7-064f-197e-5d85c243b0d3
181
+ - name: NVIDIA A100-SXM4-80GB
182
+ memory_total: '85899345920'
183
+ cuda_cores: 6912
184
+ architecture: Ampere
185
+ uuid: GPU-58fb92a9-aa05-fe07-638f-be3c40ba2436
186
+ cuda_version: '12.2'
187
+ writer_id: 452u6eq6qj9cpdr80e5nue3sk0cejbit
188
+ visual_gen:
189
+ desc: null
190
+ value: true
191
+ visual_und:
192
+ desc: null
193
+ value: true
194
+ results_dir:
195
+ desc: null
196
+ value: results
197
+ checkpoint_dir:
198
+ desc: null
199
+ value: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
200
+ wandb_project:
201
+ desc: null
202
+ value: bagel
203
+ wandb_name:
204
+ desc: null
205
+ value: vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins
206
+ wandb_runid:
207
+ desc: null
208
+ value: '0'
209
+ wandb_resume:
210
+ desc: null
211
+ value: allow
212
+ wandb_offline:
213
+ desc: null
214
+ value: true
215
+ wandb_dir:
216
+ desc: null
217
+ value: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
218
+ global_seed:
219
+ desc: null
220
+ value: 4396
221
+ auto_resume:
222
+ desc: null
223
+ value: false
224
+ resume_from:
225
+ desc: null
226
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
227
+ resume_model_only:
228
+ desc: null
229
+ value: true
230
+ finetune_from_ema:
231
+ desc: null
232
+ value: true
233
+ finetune_from_hf:
234
+ desc: null
235
+ value: true
236
+ log_every:
237
+ desc: null
238
+ value: 1
239
+ save_every:
240
+ desc: null
241
+ value: 5000
242
+ total_steps:
243
+ desc: null
244
+ value: 5000
245
+ warmup_steps:
246
+ desc: null
247
+ value: 300
248
+ lr_scheduler:
249
+ desc: null
250
+ value: cosine
251
+ lr:
252
+ desc: null
253
+ value: 2.0e-05
254
+ min_lr:
255
+ desc: null
256
+ value: 1.0e-07
257
+ beta1:
258
+ desc: null
259
+ value: 0.9
260
+ beta2:
261
+ desc: null
262
+ value: 0.95
263
+ eps:
264
+ desc: null
265
+ value: 1.0e-15
266
+ ema:
267
+ desc: null
268
+ value: 0.993
269
+ max_grad_norm:
270
+ desc: null
271
+ value: 1.0
272
+ timestep_shift:
273
+ desc: null
274
+ value: 1.0
275
+ mse_weight:
276
+ desc: null
277
+ value: 1.0
278
+ ce_weight:
279
+ desc: null
280
+ value: 1.0
281
+ ce_loss_reweighting:
282
+ desc: null
283
+ value: false
284
+ expected_num_tokens:
285
+ desc: null
286
+ value: 30000
287
+ num_replicate:
288
+ desc: null
289
+ value: 1
290
+ num_shard:
291
+ desc: null
292
+ value: 8
293
+ sharding_strategy:
294
+ desc: null
295
+ value: HYBRID_SHARD
296
+ backward_prefetch:
297
+ desc: null
298
+ value: BACKWARD_PRE
299
+ cpu_offload:
300
+ desc: null
301
+ value: false
302
+ freeze_llm:
303
+ desc: null
304
+ value: false
305
+ freeze_vit:
306
+ desc: null
307
+ value: false
308
+ freeze_vae:
309
+ desc: null
310
+ value: true
311
+ freeze_und:
312
+ desc: null
313
+ value: false
314
+ copy_init_moe:
315
+ desc: null
316
+ value: true
317
+ use_flex:
318
+ desc: null
319
+ value: false
320
+ eval_every:
321
+ desc: null
322
+ value: 500
323
+ num_eval_batches:
324
+ desc: null
325
+ value: 20
326
+ use_ema_for_eval:
327
+ desc: null
328
+ value: true
329
+ eval_log_dir:
330
+ desc: null
331
+ value: null
332
+ eval_run_tag:
333
+ desc: null
334
+ value: ''
335
+ viz_every:
336
+ desc: null
337
+ value: 500
338
+ viz_n:
339
+ desc: null
340
+ value: 8
341
+ viz_outdir:
342
+ desc: null
343
+ value: results/viz
344
+ eval_dataset_config_file:
345
+ desc: null
346
+ value: ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
347
+ viz_dataset_config_file:
348
+ desc: null
349
+ value: ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
350
+ eval_print_n:
351
+ desc: null
352
+ value: 3
353
+ save_ema_only:
354
+ desc: null
355
+ value: true
356
+ save_optimizer:
357
+ desc: null
358
+ value: false
359
+ model_path:
360
+ desc: null
361
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
362
+ llm_path:
363
+ desc: null
364
+ value: hf/Qwen2.5-0.5B-Instruct/
365
+ llm_qk_norm:
366
+ desc: null
367
+ value: true
368
+ tie_word_embeddings:
369
+ desc: null
370
+ value: false
371
+ layer_module:
372
+ desc: null
373
+ value: Qwen2MoTDecoderLayer
374
+ vae_path:
375
+ desc: null
376
+ value: flux/vae/ae.safetensors
377
+ vit_path:
378
+ desc: null
379
+ value: hf/siglip-so400m-14-980-flash-attn2-navit/
380
+ max_latent_size:
381
+ desc: null
382
+ value: 64
383
+ latent_patch_size:
384
+ desc: null
385
+ value: 2
386
+ vit_patch_size:
387
+ desc: null
388
+ value: 14
389
+ vit_max_num_patch_per_side:
390
+ desc: null
391
+ value: 70
392
+ connector_act:
393
+ desc: null
394
+ value: gelu_pytorch_tanh
395
+ interpolate_pos:
396
+ desc: null
397
+ value: false
398
+ vit_select_layer:
399
+ desc: null
400
+ value: -2
401
+ vit_rope:
402
+ desc: null
403
+ value: false
404
+ text_cond_dropout_prob:
405
+ desc: null
406
+ value: 0.0
407
+ vae_cond_dropout_prob:
408
+ desc: null
409
+ value: 0.0
410
+ vit_cond_dropout_prob:
411
+ desc: null
412
+ value: 0.0
413
+ dataset_config_file:
414
+ desc: null
415
+ value: ./data/configs/vlm_gym_reference_dot_train_mseloss_only.yaml
416
+ train_data_dir:
417
+ desc: null
418
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/train/
419
+ train_jsonl_path:
420
+ desc: null
421
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/train/
422
+ eval_data_dir:
423
+ desc: null
424
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/val/
425
+ eval_jsonl_path:
426
+ desc: null
427
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/val/
428
+ inference_hash_file:
429
+ desc: null
430
+ value: /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
431
+ task_name:
432
+ desc: null
433
+ value: reference_dot_v5
434
+ instructions_dir:
435
+ desc: null
436
+ value: ./data/instructions
437
+ prefetch_factor:
438
+ desc: null
439
+ value: 2
440
+ num_workers:
441
+ desc: null
442
+ value: 1
443
+ max_num_tokens_per_sample:
444
+ desc: null
445
+ value: 30000
446
+ max_num_tokens:
447
+ desc: null
448
+ value: 30000
449
+ prefer_buffer_before:
450
+ desc: null
451
+ value: 16384
452
+ max_buffer_size:
453
+ desc: null
454
+ value: 50
455
+ data_seed:
456
+ desc: null
457
+ value: 42
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/wandb/offline-run-20260129_223634-vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins-run0/files/output.log CHANGED
@@ -928,126 +928,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
928
  [2026-01-30 00:05:37] (step=0000917) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
929
  [2026-01-30 00:05:41] (step=0000918) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
930
  [2026-01-30 00:05:46] (step=0000919) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
931
- [2026-01-30 00:05:52] (step=0000920) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
932
- [2026-01-30 00:05:59] (step=0000921) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
933
- [2026-01-30 00:06:05] (step=0000922) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
934
- [2026-01-30 00:06:10] (step=0000923) Train Loss mse: 0.0113, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
935
- [2026-01-30 00:06:15] (step=0000924) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
936
- [2026-01-30 00:06:21] (step=0000925) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
937
- [2026-01-30 00:06:27] (step=0000926) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
938
- [2026-01-30 00:06:32] (step=0000927) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
939
- [2026-01-30 00:06:37] (step=0000928) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
940
- [2026-01-30 00:06:42] (step=0000929) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
941
- [2026-01-30 00:06:46] (step=0000930) Train Loss mse: 0.0122, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
942
- [2026-01-30 00:06:51] (step=0000931) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
943
- [2026-01-30 00:06:57] (step=0000932) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
944
- [2026-01-30 00:07:02] (step=0000933) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
945
- [2026-01-30 00:07:07] (step=0000934) Train Loss mse: 0.0071, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
946
- [2026-01-30 00:07:13] (step=0000935) Train Loss mse: 0.0064, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
947
- [2026-01-30 00:07:18] (step=0000936) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
948
- [2026-01-30 00:07:24] (step=0000937) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
949
- [2026-01-30 00:07:31] (step=0000938) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
950
- [2026-01-30 00:07:35] (step=0000939) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
951
- [2026-01-30 00:07:40] (step=0000940) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
952
- [2026-01-30 00:07:45] (step=0000941) Train Loss mse: 0.0077, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
953
- [2026-01-30 00:07:51] (step=0000942) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
954
- [2026-01-30 00:07:56] (step=0000943) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
955
- [2026-01-30 00:08:02] (step=0000944) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
956
- [2026-01-30 00:08:08] (step=0000945) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
957
- [2026-01-30 00:08:13] (step=0000946) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
958
- [2026-01-30 00:08:19] (step=0000947) Train Loss mse: 0.0058, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
959
- [2026-01-30 00:08:25] (step=0000948) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
960
- [2026-01-30 00:08:30] (step=0000949) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
961
- [2026-01-30 00:08:35] (step=0000950) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
962
- [2026-01-30 00:08:41] (step=0000951) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
963
- [2026-01-30 00:08:46] (step=0000952) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
964
- [2026-01-30 00:08:52] (step=0000953) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
965
- [2026-01-30 00:08:58] (step=0000954) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
966
- [2026-01-30 00:09:04] (step=0000955) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
967
- [2026-01-30 00:09:09] (step=0000956) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
968
- [2026-01-30 00:09:13] (step=0000957) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
969
- [2026-01-30 00:09:20] (step=0000958) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
970
- [2026-01-30 00:09:26] (step=0000959) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
971
- [2026-01-30 00:09:31] (step=0000960) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
972
- [2026-01-30 00:09:36] (step=0000961) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
973
- [2026-01-30 00:09:42] (step=0000962) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
974
- [2026-01-30 00:09:48] (step=0000963) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
975
- [2026-01-30 00:09:53] (step=0000964) Train Loss mse: 0.0070, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
976
- [2026-01-30 00:09:58] (step=0000965) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
977
- [2026-01-30 00:10:04] (step=0000966) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
978
- [2026-01-30 00:10:09] (step=0000967) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
979
- [2026-01-30 00:10:14] (step=0000968) Train Loss mse: 0.0074, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
980
- [2026-01-30 00:10:19] (step=0000969) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
981
- [2026-01-30 00:10:25] (step=0000970) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
982
- [2026-01-30 00:10:30] (step=0000971) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
983
- [2026-01-30 00:10:35] (step=0000972) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
984
- [2026-01-30 00:10:41] (step=0000973) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
985
- [2026-01-30 00:10:48] (step=0000974) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
986
- [2026-01-30 00:10:54] (step=0000975) Train Loss mse: 0.0084, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
987
- [2026-01-30 00:11:00] (step=0000976) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
988
- [2026-01-30 00:11:05] (step=0000977) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
989
- [2026-01-30 00:11:09] (step=0000978) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
990
- [2026-01-30 00:11:14] (step=0000979) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
991
- [2026-01-30 00:11:20] (step=0000980) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
992
- [2026-01-30 00:11:26] (step=0000981) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
993
- [2026-01-30 00:11:32] (step=0000982) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
994
- [2026-01-30 00:11:38] (step=0000983) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
995
- [2026-01-30 00:11:43] (step=0000984) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
996
- [2026-01-30 00:11:47] (step=0000985) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
997
- [2026-01-30 00:11:52] (step=0000986) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
998
- [2026-01-30 00:11:58] (step=0000987) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
999
- [2026-01-30 00:12:04] (step=0000988) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1000
- [2026-01-30 00:12:09] (step=0000989) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1001
- [2026-01-30 00:12:14] (step=0000990) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1002
- [2026-01-30 00:12:18] (step=0000991) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1003
- [2026-01-30 00:12:25] (step=0000992) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1004
- [2026-01-30 00:12:31] (step=0000993) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1005
- [2026-01-30 00:12:36] (step=0000994) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1006
- [2026-01-30 00:12:41] (step=0000995) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1007
- [2026-01-30 00:12:46] (step=0000996) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1008
- [2026-01-30 00:12:52] (step=0000997) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1009
- [2026-01-30 00:12:57] (step=0000998) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1010
- [2026-01-30 00:13:02] (step=0000999) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1011
- [2026-01-30 00:13:33] (step=0001000) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.03,
1012
- [2026-01-30 00:13:37] (step=0001001) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1013
- [2026-01-30 00:13:43] (step=0001002) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1014
- [2026-01-30 00:13:50] (step=0001003) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1015
- [2026-01-30 00:13:56] (step=0001004) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1016
- [2026-01-30 00:14:02] (step=0001005) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1017
- [2026-01-30 00:14:08] (step=0001006) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1018
- [2026-01-30 00:14:13] (step=0001007) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1019
- [2026-01-30 00:14:20] (step=0001008) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1020
- [2026-01-30 00:14:24] (step=0001009) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1021
- [2026-01-30 00:14:29] (step=0001010) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1022
- [2026-01-30 00:14:33] (step=0001011) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1023
- [2026-01-30 00:14:37] (step=0001012) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.25,
1024
- [2026-01-30 00:14:44] (step=0001013) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1025
- [2026-01-30 00:14:50] (step=0001014) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1026
- [2026-01-30 00:14:55] (step=0001015) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1027
- [2026-01-30 00:15:01] (step=0001016) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1028
- [2026-01-30 00:15:07] (step=0001017) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1029
- [2026-01-30 00:15:12] (step=0001018) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1030
- [2026-01-30 00:15:18] (step=0001019) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1031
- [2026-01-30 00:15:23] (step=0001020) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1032
- [2026-01-30 00:15:29] (step=0001021) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1033
- [2026-01-30 00:15:35] (step=0001022) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1034
- [2026-01-30 00:15:42] (step=0001023) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1035
- [2026-01-30 00:15:48] (step=0001024) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1036
- [2026-01-30 00:15:53] (step=0001025) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1037
- [2026-01-30 00:15:58] (step=0001026) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1038
- [2026-01-30 00:16:03] (step=0001027) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1039
- [2026-01-30 00:16:09] (step=0001028) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1040
- [2026-01-30 00:16:16] (step=0001029) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1041
- [2026-01-30 00:16:21] (step=0001030) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
1042
- [2026-01-30 00:16:26] (step=0001031) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1043
- [2026-01-30 00:16:32] (step=0001032) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1044
- [2026-01-30 00:16:36] (step=0001033) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1045
- [2026-01-30 00:16:43] (step=0001034) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1046
- [2026-01-30 00:16:48] (step=0001035) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1047
- [2026-01-30 00:16:52] (step=0001036) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.23,
1048
- [2026-01-30 00:16:57] (step=0001037) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1049
- [2026-01-30 00:17:03] (step=0001038) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1050
- [2026-01-30 00:17:09] (step=0001039) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1051
  FullyShardedDataParallel(
1052
  (_fsdp_wrapped_module): Bagel(
1053
  (language_model): Qwen2ForCausalLM(
@@ -1234,13 +1114,6 @@ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference
1234
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1235
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1236
  ce_avg: 0.0, mse_avg: 0.0055910381488502026
1237
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step1000
1238
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
1239
- [eval debug] first 3 batch fingerprints:
1240
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1241
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1242
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1243
- ce_avg: 0.0, mse_avg: 0.005657645873725414
1244
  base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step1500
1245
  Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
1246
  [eval debug] first 3 batch fingerprints:
@@ -1248,13 +1121,126 @@ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference
1248
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1249
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1250
  ce_avg: 0.0, mse_avg: 0.0055648270063102245
1251
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2000
1252
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
1253
- [eval debug] first 3 batch fingerprints:
1254
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1255
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1256
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1257
- ce_avg: 0.0, mse_avg: 0.0056802802719175816
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1258
  [2026-01-30 00:17:14] (step=0001040) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1259
  [2026-01-30 00:17:19] (step=0001041) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1260
  [2026-01-30 00:17:24] (step=0001042) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
@@ -2518,6 +2504,27 @@ ce_avg: 0.0, mse_avg: 0.0056802802719175816
2518
  [2026-01-30 02:13:09] (step=0002300) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
2519
  [2026-01-30 02:13:14] (step=0002301) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
2520
  [2026-01-30 02:13:20] (step=0002302) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2521
  [2026-01-30 02:13:25] (step=0002303) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
2522
  [2026-01-30 02:13:30] (step=0002304) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
2523
  [2026-01-30 02:13:36] (step=0002305) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
@@ -2621,20 +2628,6 @@ ce_avg: 0.0, mse_avg: 0.0056802802719175816
2621
  [2026-01-30 02:22:40] (step=0002403) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
2622
  [2026-01-30 02:22:45] (step=0002404) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
2623
  [2026-01-30 02:22:50] (step=0002405) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
2624
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2500
2625
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
2626
- [eval debug] first 3 batch fingerprints:
2627
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2628
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2629
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2630
- ce_avg: 0.0, mse_avg: 0.005605650134384632
2631
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3000
2632
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
2633
- [eval debug] first 3 batch fingerprints:
2634
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2635
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2636
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2637
- ce_avg: 0.0, mse_avg: 0.005620267707854509
2638
  [2026-01-30 02:22:56] (step=0002406) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
2639
  [2026-01-30 02:23:01] (step=0002407) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
2640
  [2026-01-30 02:23:06] (step=0002408) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
@@ -3549,6 +3542,27 @@ ce_avg: 0.0, mse_avg: 0.005620267707854509
3549
  [2026-01-30 03:46:48] (step=0003317) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3550
  [2026-01-30 03:46:53] (step=0003318) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3551
  [2026-01-30 03:46:59] (step=0003319) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3552
  [2026-01-30 03:47:05] (step=0003320) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3553
  [2026-01-30 03:47:11] (step=0003321) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3554
  [2026-01-30 03:47:16] (step=0003322) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
@@ -3595,27 +3609,53 @@ ce_avg: 0.0, mse_avg: 0.005620267707854509
3595
  [2026-01-30 03:51:02] (step=0003363) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3596
  [2026-01-30 03:51:07] (step=0003364) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3597
  [2026-01-30 03:51:13] (step=0003365) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3598
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3500
3599
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
3600
- [eval debug] first 3 batch fingerprints:
3601
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3602
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3603
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3604
- ce_avg: 0.0, mse_avg: 0.005567264277487993
3605
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step4000
3606
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
3607
- [eval debug] first 3 batch fingerprints:
3608
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3609
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3610
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3611
- ce_avg: 0.0, mse_avg: 0.005860968492925167
3612
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step4500
3613
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
3614
- [eval debug] first 3 batch fingerprints:
3615
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3616
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3617
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3618
- ce_avg: 0.0, mse_avg: 0.0062998272478580475
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3619
  [2026-01-30 03:55:32] (step=0003413) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
3620
  [2026-01-30 03:55:37] (step=0003414) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3621
  [2026-01-30 03:55:42] (step=0003415) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
@@ -4986,13 +5026,6 @@ ce_avg: 0.0, mse_avg: 0.0062998272478580475
4986
  [2026-01-30 06:00:41] (step=0004780) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
4987
  [2026-01-30 06:00:47] (step=0004781) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
4988
  [2026-01-30 06:00:53] (step=0004782) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
4989
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step5000
4990
- Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
4991
- [eval debug] first 3 batch fingerprints:
4992
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
4993
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
4994
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
4995
- ce_avg: 0.0, mse_avg: 0.005609693005681038
4996
  [2026-01-30 06:01:00] (step=0004783) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
4997
  [2026-01-30 06:01:04] (step=0004784) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
4998
  [2026-01-30 06:01:09] (step=0004785) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
@@ -5019,6 +5052,13 @@ ce_avg: 0.0, mse_avg: 0.005609693005681038
5019
  [2026-01-30 06:03:00] (step=0004806) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5020
  [2026-01-30 06:03:05] (step=0004807) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5021
  [2026-01-30 06:03:10] (step=0004808) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
 
 
 
 
 
 
 
5022
  [2026-01-30 06:03:15] (step=0004809) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5023
  [2026-01-30 06:03:21] (step=0004810) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
5024
  [2026-01-30 06:03:27] (step=0004811) Train Loss mse: 0.0036, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
 
928
  [2026-01-30 00:05:37] (step=0000917) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
929
  [2026-01-30 00:05:41] (step=0000918) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
930
  [2026-01-30 00:05:46] (step=0000919) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
931
  FullyShardedDataParallel(
932
  (_fsdp_wrapped_module): Bagel(
933
  (language_model): Qwen2ForCausalLM(
 
1114
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1115
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1116
  ce_avg: 0.0, mse_avg: 0.0055910381488502026
 
 
 
 
 
 
 
1117
  base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step1500
1118
  Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
1119
  [eval debug] first 3 batch fingerprints:
 
1121
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1122
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
1123
  ce_avg: 0.0, mse_avg: 0.0055648270063102245
1124
+ [2026-01-30 00:05:52] (step=0000920) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1125
+ [2026-01-30 00:05:59] (step=0000921) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1126
+ [2026-01-30 00:06:05] (step=0000922) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1127
+ [2026-01-30 00:06:10] (step=0000923) Train Loss mse: 0.0113, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1128
+ [2026-01-30 00:06:15] (step=0000924) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1129
+ [2026-01-30 00:06:21] (step=0000925) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1130
+ [2026-01-30 00:06:27] (step=0000926) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1131
+ [2026-01-30 00:06:32] (step=0000927) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1132
+ [2026-01-30 00:06:37] (step=0000928) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1133
+ [2026-01-30 00:06:42] (step=0000929) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1134
+ [2026-01-30 00:06:46] (step=0000930) Train Loss mse: 0.0122, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1135
+ [2026-01-30 00:06:51] (step=0000931) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1136
+ [2026-01-30 00:06:57] (step=0000932) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1137
+ [2026-01-30 00:07:02] (step=0000933) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1138
+ [2026-01-30 00:07:07] (step=0000934) Train Loss mse: 0.0071, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1139
+ [2026-01-30 00:07:13] (step=0000935) Train Loss mse: 0.0064, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1140
+ [2026-01-30 00:07:18] (step=0000936) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1141
+ [2026-01-30 00:07:24] (step=0000937) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1142
+ [2026-01-30 00:07:31] (step=0000938) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1143
+ [2026-01-30 00:07:35] (step=0000939) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1144
+ [2026-01-30 00:07:40] (step=0000940) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1145
+ [2026-01-30 00:07:45] (step=0000941) Train Loss mse: 0.0077, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1146
+ [2026-01-30 00:07:51] (step=0000942) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1147
+ [2026-01-30 00:07:56] (step=0000943) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1148
+ [2026-01-30 00:08:02] (step=0000944) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
1149
+ [2026-01-30 00:08:08] (step=0000945) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1150
+ [2026-01-30 00:08:13] (step=0000946) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1151
+ [2026-01-30 00:08:19] (step=0000947) Train Loss mse: 0.0058, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1152
+ [2026-01-30 00:08:25] (step=0000948) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
1153
+ [2026-01-30 00:08:30] (step=0000949) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1154
+ [2026-01-30 00:08:35] (step=0000950) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1155
+ [2026-01-30 00:08:41] (step=0000951) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1156
+ [2026-01-30 00:08:46] (step=0000952) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1157
+ [2026-01-30 00:08:52] (step=0000953) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1158
+ [2026-01-30 00:08:58] (step=0000954) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1159
+ [2026-01-30 00:09:04] (step=0000955) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
1160
+ [2026-01-30 00:09:09] (step=0000956) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1161
+ [2026-01-30 00:09:13] (step=0000957) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1162
+ [2026-01-30 00:09:20] (step=0000958) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1163
+ [2026-01-30 00:09:26] (step=0000959) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1164
+ [2026-01-30 00:09:31] (step=0000960) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1165
+ [2026-01-30 00:09:36] (step=0000961) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1166
+ [2026-01-30 00:09:42] (step=0000962) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1167
+ [2026-01-30 00:09:48] (step=0000963) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1168
+ [2026-01-30 00:09:53] (step=0000964) Train Loss mse: 0.0070, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1169
+ [2026-01-30 00:09:58] (step=0000965) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1170
+ [2026-01-30 00:10:04] (step=0000966) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1171
+ [2026-01-30 00:10:09] (step=0000967) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1172
+ [2026-01-30 00:10:14] (step=0000968) Train Loss mse: 0.0074, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1173
+ [2026-01-30 00:10:19] (step=0000969) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1174
+ [2026-01-30 00:10:25] (step=0000970) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1175
+ [2026-01-30 00:10:30] (step=0000971) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1176
+ [2026-01-30 00:10:35] (step=0000972) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1177
+ [2026-01-30 00:10:41] (step=0000973) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1178
+ [2026-01-30 00:10:48] (step=0000974) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1179
+ [2026-01-30 00:10:54] (step=0000975) Train Loss mse: 0.0084, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1180
+ [2026-01-30 00:11:00] (step=0000976) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1181
+ [2026-01-30 00:11:05] (step=0000977) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1182
+ [2026-01-30 00:11:09] (step=0000978) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1183
+ [2026-01-30 00:11:14] (step=0000979) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1184
+ [2026-01-30 00:11:20] (step=0000980) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1185
+ [2026-01-30 00:11:26] (step=0000981) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1186
+ [2026-01-30 00:11:32] (step=0000982) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1187
+ [2026-01-30 00:11:38] (step=0000983) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1188
+ [2026-01-30 00:11:43] (step=0000984) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1189
+ [2026-01-30 00:11:47] (step=0000985) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1190
+ [2026-01-30 00:11:52] (step=0000986) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1191
+ [2026-01-30 00:11:58] (step=0000987) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1192
+ [2026-01-30 00:12:04] (step=0000988) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1193
+ [2026-01-30 00:12:09] (step=0000989) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1194
+ [2026-01-30 00:12:14] (step=0000990) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1195
+ [2026-01-30 00:12:18] (step=0000991) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1196
+ [2026-01-30 00:12:25] (step=0000992) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1197
+ [2026-01-30 00:12:31] (step=0000993) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1198
+ [2026-01-30 00:12:36] (step=0000994) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1199
+ [2026-01-30 00:12:41] (step=0000995) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1200
+ [2026-01-30 00:12:46] (step=0000996) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1201
+ [2026-01-30 00:12:52] (step=0000997) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1202
+ [2026-01-30 00:12:57] (step=0000998) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1203
+ [2026-01-30 00:13:02] (step=0000999) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1204
+ [2026-01-30 00:13:33] (step=0001000) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.03,
1205
+ [2026-01-30 00:13:37] (step=0001001) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1206
+ [2026-01-30 00:13:43] (step=0001002) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1207
+ [2026-01-30 00:13:50] (step=0001003) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1208
+ [2026-01-30 00:13:56] (step=0001004) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1209
+ [2026-01-30 00:14:02] (step=0001005) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1210
+ [2026-01-30 00:14:08] (step=0001006) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1211
+ [2026-01-30 00:14:13] (step=0001007) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1212
+ [2026-01-30 00:14:20] (step=0001008) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1213
+ [2026-01-30 00:14:24] (step=0001009) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1214
+ [2026-01-30 00:14:29] (step=0001010) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1215
+ [2026-01-30 00:14:33] (step=0001011) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1216
+ [2026-01-30 00:14:37] (step=0001012) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.25,
1217
+ [2026-01-30 00:14:44] (step=0001013) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1218
+ [2026-01-30 00:14:50] (step=0001014) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1219
+ [2026-01-30 00:14:55] (step=0001015) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1220
+ [2026-01-30 00:15:01] (step=0001016) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1221
+ [2026-01-30 00:15:07] (step=0001017) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1222
+ [2026-01-30 00:15:12] (step=0001018) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1223
+ [2026-01-30 00:15:18] (step=0001019) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1224
+ [2026-01-30 00:15:23] (step=0001020) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1225
+ [2026-01-30 00:15:29] (step=0001021) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1226
+ [2026-01-30 00:15:35] (step=0001022) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1227
+ [2026-01-30 00:15:42] (step=0001023) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1228
+ [2026-01-30 00:15:48] (step=0001024) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1229
+ [2026-01-30 00:15:53] (step=0001025) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1230
+ [2026-01-30 00:15:58] (step=0001026) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
1231
+ [2026-01-30 00:16:03] (step=0001027) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1232
+ [2026-01-30 00:16:09] (step=0001028) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
1233
+ [2026-01-30 00:16:16] (step=0001029) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1234
+ [2026-01-30 00:16:21] (step=0001030) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
1235
+ [2026-01-30 00:16:26] (step=0001031) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1236
+ [2026-01-30 00:16:32] (step=0001032) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
1237
+ [2026-01-30 00:16:36] (step=0001033) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1238
+ [2026-01-30 00:16:43] (step=0001034) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1239
+ [2026-01-30 00:16:48] (step=0001035) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
1240
+ [2026-01-30 00:16:52] (step=0001036) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.23,
1241
+ [2026-01-30 00:16:57] (step=0001037) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
1242
+ [2026-01-30 00:17:03] (step=0001038) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1243
+ [2026-01-30 00:17:09] (step=0001039) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
1244
  [2026-01-30 00:17:14] (step=0001040) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1245
  [2026-01-30 00:17:19] (step=0001041) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
1246
  [2026-01-30 00:17:24] (step=0001042) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
 
2504
  [2026-01-30 02:13:09] (step=0002300) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
2505
  [2026-01-30 02:13:14] (step=0002301) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
2506
  [2026-01-30 02:13:20] (step=0002302) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
2507
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2000
2508
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
2509
+ [eval debug] first 3 batch fingerprints:
2510
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2511
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2512
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2513
+ ce_avg: 0.0, mse_avg: 0.0056802802719175816
2514
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2500
2515
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
2516
+ [eval debug] first 3 batch fingerprints:
2517
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2518
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2519
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2520
+ ce_avg: 0.0, mse_avg: 0.005605650134384632
2521
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3000
2522
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
2523
+ [eval debug] first 3 batch fingerprints:
2524
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2525
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2526
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
2527
+ ce_avg: 0.0, mse_avg: 0.005620267707854509
2528
  [2026-01-30 02:13:25] (step=0002303) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
2529
  [2026-01-30 02:13:30] (step=0002304) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
2530
  [2026-01-30 02:13:36] (step=0002305) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
 
2628
  [2026-01-30 02:22:40] (step=0002403) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
2629
  [2026-01-30 02:22:45] (step=0002404) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
2630
  [2026-01-30 02:22:50] (step=0002405) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2631
  [2026-01-30 02:22:56] (step=0002406) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
2632
  [2026-01-30 02:23:01] (step=0002407) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
2633
  [2026-01-30 02:23:06] (step=0002408) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
 
3542
  [2026-01-30 03:46:48] (step=0003317) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3543
  [2026-01-30 03:46:53] (step=0003318) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3544
  [2026-01-30 03:46:59] (step=0003319) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3545
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3500
3546
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
3547
+ [eval debug] first 3 batch fingerprints:
3548
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3549
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3550
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3551
+ ce_avg: 0.0, mse_avg: 0.005567264277487993
3552
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step4000
3553
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
3554
+ [eval debug] first 3 batch fingerprints:
3555
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3556
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3557
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3558
+ ce_avg: 0.0, mse_avg: 0.005860968492925167
3559
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step4500
3560
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
3561
+ [eval debug] first 3 batch fingerprints:
3562
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3563
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3564
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
3565
+ ce_avg: 0.0, mse_avg: 0.0062998272478580475
3566
  [2026-01-30 03:47:05] (step=0003320) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3567
  [2026-01-30 03:47:11] (step=0003321) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3568
  [2026-01-30 03:47:16] (step=0003322) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
 
3609
  [2026-01-30 03:51:02] (step=0003363) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3610
  [2026-01-30 03:51:07] (step=0003364) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3611
  [2026-01-30 03:51:13] (step=0003365) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3612
+ [2026-01-30 03:51:17] (step=0003366) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
3613
+ [2026-01-30 03:51:22] (step=0003367) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
3614
+ [2026-01-30 03:51:28] (step=0003368) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3615
+ [2026-01-30 03:51:35] (step=0003369) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
3616
+ [2026-01-30 03:51:40] (step=0003370) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3617
+ [2026-01-30 03:51:45] (step=0003371) Train Loss mse: 0.0038, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3618
+ [2026-01-30 03:51:50] (step=0003372) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3619
+ [2026-01-30 03:51:55] (step=0003373) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3620
+ [2026-01-30 03:52:00] (step=0003374) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3621
+ [2026-01-30 03:52:05] (step=0003375) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3622
+ [2026-01-30 03:52:10] (step=0003376) Train Loss mse: 0.0033, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
3623
+ [2026-01-30 03:52:15] (step=0003377) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
3624
+ [2026-01-30 03:52:20] (step=0003378) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3625
+ [2026-01-30 03:52:26] (step=0003379) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
3626
+ [2026-01-30 03:52:32] (step=0003380) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3627
+ [2026-01-30 03:52:38] (step=0003381) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
3628
+ [2026-01-30 03:52:43] (step=0003382) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
3629
+ [2026-01-30 03:52:48] (step=0003383) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3630
+ [2026-01-30 03:52:54] (step=0003384) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3631
+ [2026-01-30 03:52:59] (step=0003385) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3632
+ [2026-01-30 03:53:04] (step=0003386) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
3633
+ [2026-01-30 03:53:11] (step=0003387) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3634
+ [2026-01-30 03:53:17] (step=0003388) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3635
+ [2026-01-30 03:53:23] (step=0003389) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3636
+ [2026-01-30 03:53:29] (step=0003390) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3637
+ [2026-01-30 03:53:34] (step=0003391) Train Loss mse: 0.0038, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3638
+ [2026-01-30 03:53:40] (step=0003392) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3639
+ [2026-01-30 03:53:45] (step=0003393) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3640
+ [2026-01-30 03:53:50] (step=0003394) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3641
+ [2026-01-30 03:53:55] (step=0003395) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3642
+ [2026-01-30 03:54:00] (step=0003396) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3643
+ [2026-01-30 03:54:05] (step=0003397) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3644
+ [2026-01-30 03:54:12] (step=0003398) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
3645
+ [2026-01-30 03:54:18] (step=0003399) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3646
+ [2026-01-30 03:54:23] (step=0003400) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3647
+ [2026-01-30 03:54:28] (step=0003401) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3648
+ [2026-01-30 03:54:32] (step=0003402) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3649
+ [2026-01-30 03:54:37] (step=0003403) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3650
+ [2026-01-30 03:54:43] (step=0003404) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
3651
+ [2026-01-30 03:54:48] (step=0003405) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3652
+ [2026-01-30 03:54:54] (step=0003406) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3653
+ [2026-01-30 03:54:59] (step=0003407) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
3654
+ [2026-01-30 03:55:04] (step=0003408) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
3655
+ [2026-01-30 03:55:10] (step=0003409) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3656
+ [2026-01-30 03:55:16] (step=0003410) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3657
+ [2026-01-30 03:55:23] (step=0003411) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
3658
+ [2026-01-30 03:55:28] (step=0003412) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
3659
  [2026-01-30 03:55:32] (step=0003413) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
3660
  [2026-01-30 03:55:37] (step=0003414) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
3661
  [2026-01-30 03:55:42] (step=0003415) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
 
5026
  [2026-01-30 06:00:41] (step=0004780) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
5027
  [2026-01-30 06:00:47] (step=0004781) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
5028
  [2026-01-30 06:00:53] (step=0004782) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
 
 
 
 
 
 
 
5029
  [2026-01-30 06:01:00] (step=0004783) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
5030
  [2026-01-30 06:01:04] (step=0004784) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
5031
  [2026-01-30 06:01:09] (step=0004785) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
 
5052
  [2026-01-30 06:03:00] (step=0004806) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5053
  [2026-01-30 06:03:05] (step=0004807) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5054
  [2026-01-30 06:03:10] (step=0004808) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5055
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step5000
5056
+ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
5057
+ [eval debug] first 3 batch fingerprints:
5058
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
5059
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
5060
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
5061
+ ce_avg: 0.0, mse_avg: 0.005609693005681038
5062
  [2026-01-30 06:03:15] (step=0004809) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
5063
  [2026-01-30 06:03:21] (step=0004810) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
5064
  [2026-01-30 06:03:27] (step=0004811) Train Loss mse: 0.0036, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,