Junyi42 commited on
Commit
c1a7829
·
verified ·
1 Parent(s): 4f6e178

Upload checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins

Browse files
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/0005000/ema.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:504ca814489e807517cb32fa3c8b8022540650eb89f469efddd34f7029f09ed0
3
+ size 58429204680
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/wandb/offline-run-20260129_223432-checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins-run0/files/config.yaml CHANGED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _wandb:
4
+ desc: null
5
+ value:
6
+ python_version: 3.11.10
7
+ cli_version: 0.23.1
8
+ framework: huggingface
9
+ huggingface_version: 4.49.0
10
+ is_jupyter_run: false
11
+ is_kaggle_kernel: false
12
+ start_time: 1769726072
13
+ t:
14
+ 1:
15
+ - 1
16
+ - 5
17
+ - 11
18
+ - 41
19
+ - 49
20
+ - 53
21
+ - 71
22
+ - 105
23
+ 2:
24
+ - 1
25
+ - 5
26
+ - 11
27
+ - 41
28
+ - 49
29
+ - 53
30
+ - 71
31
+ - 105
32
+ 3:
33
+ - 4
34
+ - 13
35
+ - 14
36
+ - 37
37
+ - 42
38
+ 4: 3.11.10
39
+ 5: 0.23.1
40
+ 6: 4.49.0
41
+ 13: linux-x86_64
42
+ e:
43
+ 6tk8s0gf5rmih37ltdvrvulzn60qwyq6:
44
+ os: Linux-6.6.93+-x86_64-with-glibc2.35
45
+ python: CPython 3.11.10
46
+ started_at: '2026-01-29T22:34:32.380519Z'
47
+ args:
48
+ - --dataset_config_file
49
+ - ./data/configs/vlm_gym_reference_dot_train_celoss.yaml
50
+ - --eval_dataset_config_file
51
+ - ./data/configs/vlm_gym_reference_dot_eval_celoss.yaml
52
+ - --viz_dataset_config_file
53
+ - ./data/configs/vlm_gym_reference_dot_eval_celoss.yaml
54
+ - --inference_hash_file
55
+ - /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
56
+ - --task_name
57
+ - reference_dot_v5
58
+ - --instructions_dir
59
+ - ./data/instructions
60
+ - --train_data_dir
61
+ - /home/clouduser/Code/data/gym/reference_dot_v5/train/
62
+ - --train_jsonl_path
63
+ - /home/clouduser/Code/data/gym/reference_dot_v5/train/
64
+ - --eval_data_dir
65
+ - /home/clouduser/Code/data/gym/reference_dot_v5/val/
66
+ - --eval_jsonl_path
67
+ - /home/clouduser/Code/data/gym/reference_dot_v5/val/
68
+ - --model_path
69
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
70
+ - --layer_module
71
+ - Qwen2MoTDecoderLayer
72
+ - --max_latent_size
73
+ - '64'
74
+ - --resume-from
75
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
76
+ - --finetune_from_hf
77
+ - 'True'
78
+ - --auto_resume
79
+ - 'False'
80
+ - --resume-model-only
81
+ - 'True'
82
+ - --finetune-from-ema
83
+ - 'True'
84
+ - --log_every
85
+ - '1'
86
+ - --lr
87
+ - 2e-5
88
+ - --warmup_steps
89
+ - '300'
90
+ - --lr_scheduler
91
+ - cosine
92
+ - --num_worker
93
+ - '1'
94
+ - --expected_num_tokens
95
+ - '30000'
96
+ - --max_num_tokens
97
+ - '30000'
98
+ - --max_num_tokens_per_sample
99
+ - '30000'
100
+ - --visual_und
101
+ - 'True'
102
+ - --save_every
103
+ - '2500'
104
+ - --total_steps
105
+ - '5000'
106
+ - --text_cond_dropout_prob
107
+ - '0.0'
108
+ - --vae_cond_dropout_prob
109
+ - '0.3'
110
+ - --vit_cond_dropout_prob
111
+ - '0.0'
112
+ - --ema
113
+ - '0.993'
114
+ - --checkpoint_dir
115
+ - /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
116
+ - --wandb_project
117
+ - bagel
118
+ - --wandb_name
119
+ - checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
120
+ - --wandb_dir
121
+ - /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
122
+ - --wandb_offline
123
+ - 'True'
124
+ program: /home/clouduser/Code/Github/unified_world_model/train/pretrain_unified_navit.py
125
+ code_path: train/pretrain_unified_navit.py
126
+ code_path_local: train/pretrain_unified_navit.py
127
+ git:
128
+ remote_url: https://github.com/para-lost/unified_world_model
129
+ commit: 8d7b26b7e552fc87b592cf3be94d93be7aeca2a9
130
+ root: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
131
+ host: junyizhang-launch-new-226786192-1-0
132
+ executable: /opt/conda/bin/python3.11
133
+ cpu_count: 48
134
+ cpu_count_logical: 96
135
+ gpu_type: NVIDIA A100-SXM4-80GB
136
+ gpu_count: 8
137
+ disk:
138
+ /:
139
+ total: '1052461830144'
140
+ used: '164464467968'
141
+ memory:
142
+ total: '1437332606976'
143
+ gpu_nvidia:
144
+ - name: NVIDIA A100-SXM4-80GB
145
+ memory_total: '85899345920'
146
+ cuda_cores: 6912
147
+ architecture: Ampere
148
+ uuid: GPU-718e070b-29b1-a9c6-6ba9-0b308e8e14b6
149
+ - name: NVIDIA A100-SXM4-80GB
150
+ memory_total: '85899345920'
151
+ cuda_cores: 6912
152
+ architecture: Ampere
153
+ uuid: GPU-860c1016-b60a-e80f-ec59-7d7a02edcb79
154
+ - name: NVIDIA A100-SXM4-80GB
155
+ memory_total: '85899345920'
156
+ cuda_cores: 6912
157
+ architecture: Ampere
158
+ uuid: GPU-2d63d08c-e8a0-154b-d0ef-8fda2ae1bf3f
159
+ - name: NVIDIA A100-SXM4-80GB
160
+ memory_total: '85899345920'
161
+ cuda_cores: 6912
162
+ architecture: Ampere
163
+ uuid: GPU-6da90038-6dcb-16f8-9d1a-7512461977f7
164
+ - name: NVIDIA A100-SXM4-80GB
165
+ memory_total: '85899345920'
166
+ cuda_cores: 6912
167
+ architecture: Ampere
168
+ uuid: GPU-1978a88e-d773-51d3-9c6e-16af2fa02d54
169
+ - name: NVIDIA A100-SXM4-80GB
170
+ memory_total: '85899345920'
171
+ cuda_cores: 6912
172
+ architecture: Ampere
173
+ uuid: GPU-dc134b53-25d6-69ec-c7d8-a6539915eed3
174
+ - name: NVIDIA A100-SXM4-80GB
175
+ memory_total: '85899345920'
176
+ cuda_cores: 6912
177
+ architecture: Ampere
178
+ uuid: GPU-32301ac6-38ad-f648-7d53-f81113c91cd0
179
+ - name: NVIDIA A100-SXM4-80GB
180
+ memory_total: '85899345920'
181
+ cuda_cores: 6912
182
+ architecture: Ampere
183
+ uuid: GPU-d9e60c8f-df0c-485f-3d26-a38485ea847a
184
+ cuda_version: '12.2'
185
+ writer_id: 6tk8s0gf5rmih37ltdvrvulzn60qwyq6
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/wandb/offline-run-20260130_175019-checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins-run0/files/config.yaml CHANGED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _wandb:
4
+ desc: null
5
+ value:
6
+ python_version: 3.11.10
7
+ cli_version: 0.23.1
8
+ framework: huggingface
9
+ huggingface_version: 4.49.0
10
+ is_jupyter_run: false
11
+ is_kaggle_kernel: false
12
+ start_time: 1769795419
13
+ t:
14
+ 1:
15
+ - 1
16
+ - 5
17
+ - 11
18
+ - 41
19
+ - 49
20
+ - 53
21
+ - 71
22
+ - 105
23
+ 2:
24
+ - 1
25
+ - 5
26
+ - 11
27
+ - 41
28
+ - 49
29
+ - 53
30
+ - 71
31
+ - 105
32
+ 3:
33
+ - 2
34
+ - 4
35
+ - 13
36
+ - 14
37
+ - 37
38
+ - 42
39
+ - 61
40
+ 4: 3.11.10
41
+ 5: 0.23.1
42
+ 6: 4.49.0
43
+ 13: linux-x86_64
44
+ e:
45
+ oxr547070rpilwmqbpq2pd9ofzvd2cmh:
46
+ os: Linux-6.6.93+-x86_64-with-glibc2.35
47
+ python: CPython 3.11.10
48
+ started_at: '2026-01-30T17:50:19.134291Z'
49
+ args:
50
+ - --dataset_config_file
51
+ - ./data/configs/vlm_gym_reference_dot_train_celoss.yaml
52
+ - --eval_dataset_config_file
53
+ - ./data/configs/vlm_gym_reference_dot_eval_celoss.yaml
54
+ - --viz_dataset_config_file
55
+ - ./data/configs/vlm_gym_reference_dot_eval_celoss.yaml
56
+ - --inference_hash_file
57
+ - /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
58
+ - --task_name
59
+ - reference_dot_v5
60
+ - --instructions_dir
61
+ - ./data/instructions
62
+ - --train_data_dir
63
+ - /home/clouduser/Code/data/gym/reference_dot_v5/train/
64
+ - --train_jsonl_path
65
+ - /home/clouduser/Code/data/gym/reference_dot_v5/train/
66
+ - --eval_data_dir
67
+ - /home/clouduser/Code/data/gym/reference_dot_v5/val/
68
+ - --eval_jsonl_path
69
+ - /home/clouduser/Code/data/gym/reference_dot_v5/val/
70
+ - --model_path
71
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
72
+ - --layer_module
73
+ - Qwen2MoTDecoderLayer
74
+ - --max_latent_size
75
+ - '64'
76
+ - --resume-from
77
+ - /home/clouduser/Code/Models/BAGEL-7B-MoT
78
+ - --finetune_from_hf
79
+ - 'True'
80
+ - --auto_resume
81
+ - 'False'
82
+ - --resume-model-only
83
+ - 'True'
84
+ - --finetune-from-ema
85
+ - 'True'
86
+ - --log_every
87
+ - '1'
88
+ - --lr
89
+ - 2e-5
90
+ - --warmup_steps
91
+ - '300'
92
+ - --lr_scheduler
93
+ - cosine
94
+ - --num_worker
95
+ - '1'
96
+ - --expected_num_tokens
97
+ - '40000'
98
+ - --max_num_tokens
99
+ - '40000'
100
+ - --max_num_tokens_per_sample
101
+ - '40000'
102
+ - --visual_und
103
+ - 'True'
104
+ - --save_every
105
+ - '2500'
106
+ - --total_steps
107
+ - '5000'
108
+ - --text_cond_dropout_prob
109
+ - '0.0'
110
+ - --vae_cond_dropout_prob
111
+ - '0.3'
112
+ - --vit_cond_dropout_prob
113
+ - '0.0'
114
+ - --ema
115
+ - '0.993'
116
+ - --checkpoint_dir
117
+ - /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
118
+ - --wandb_project
119
+ - bagel
120
+ - --wandb_name
121
+ - checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
122
+ - --wandb_dir
123
+ - /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
124
+ - --wandb_offline
125
+ - 'True'
126
+ program: /home/clouduser/Code/Github/unified_world_model/train/pretrain_unified_navit.py
127
+ code_path: train/pretrain_unified_navit.py
128
+ code_path_local: train/pretrain_unified_navit.py
129
+ git:
130
+ remote_url: https://github.com/para-lost/unified_world_model
131
+ commit: 20694bc0d8bf3c48fe817e31af63a14d6e1e3619
132
+ root: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
133
+ host: junyizhang-launch-new-226991772-1-0
134
+ executable: /opt/conda/bin/python3.11
135
+ cpu_count: 48
136
+ cpu_count_logical: 96
137
+ gpu_type: NVIDIA A100-SXM4-80GB
138
+ gpu_count: 8
139
+ disk:
140
+ /:
141
+ total: '1052461830144'
142
+ used: '164465041408'
143
+ memory:
144
+ total: '1437332606976'
145
+ gpu_nvidia:
146
+ - name: NVIDIA A100-SXM4-80GB
147
+ memory_total: '85899345920'
148
+ cuda_cores: 6912
149
+ architecture: Ampere
150
+ uuid: GPU-cd28017c-3793-b451-76cc-3904ca15388e
151
+ - name: NVIDIA A100-SXM4-80GB
152
+ memory_total: '85899345920'
153
+ cuda_cores: 6912
154
+ architecture: Ampere
155
+ uuid: GPU-e6224942-e179-ef85-abeb-afb4424bb934
156
+ - name: NVIDIA A100-SXM4-80GB
157
+ memory_total: '85899345920'
158
+ cuda_cores: 6912
159
+ architecture: Ampere
160
+ uuid: GPU-21c6cca4-bc02-fbdc-e654-6039d84c8646
161
+ - name: NVIDIA A100-SXM4-80GB
162
+ memory_total: '85899345920'
163
+ cuda_cores: 6912
164
+ architecture: Ampere
165
+ uuid: GPU-5f3fa3a7-ddb3-6751-9af7-4b4d38f7a411
166
+ - name: NVIDIA A100-SXM4-80GB
167
+ memory_total: '85899345920'
168
+ cuda_cores: 6912
169
+ architecture: Ampere
170
+ uuid: GPU-4e7da01a-ba1b-e632-6334-7edc29e0de7f
171
+ - name: NVIDIA A100-SXM4-80GB
172
+ memory_total: '85899345920'
173
+ cuda_cores: 6912
174
+ architecture: Ampere
175
+ uuid: GPU-2604cd40-2d05-63d5-0db4-27495b2a908c
176
+ - name: NVIDIA A100-SXM4-80GB
177
+ memory_total: '85899345920'
178
+ cuda_cores: 6912
179
+ architecture: Ampere
180
+ uuid: GPU-8b72b62b-9a63-80eb-c8f8-bf16ad127f33
181
+ - name: NVIDIA A100-SXM4-80GB
182
+ memory_total: '85899345920'
183
+ cuda_cores: 6912
184
+ architecture: Ampere
185
+ uuid: GPU-1ff104d8-f2ed-7a12-c3e9-3b5b61be1a4f
186
+ cuda_version: '12.2'
187
+ writer_id: oxr547070rpilwmqbpq2pd9ofzvd2cmh
188
+ visual_gen:
189
+ desc: null
190
+ value: true
191
+ visual_und:
192
+ desc: null
193
+ value: true
194
+ results_dir:
195
+ desc: null
196
+ value: results
197
+ checkpoint_dir:
198
+ desc: null
199
+ value: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
200
+ wandb_project:
201
+ desc: null
202
+ value: bagel
203
+ wandb_name:
204
+ desc: null
205
+ value: checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
206
+ wandb_runid:
207
+ desc: null
208
+ value: '0'
209
+ wandb_resume:
210
+ desc: null
211
+ value: allow
212
+ wandb_offline:
213
+ desc: null
214
+ value: true
215
+ wandb_dir:
216
+ desc: null
217
+ value: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins
218
+ global_seed:
219
+ desc: null
220
+ value: 4396
221
+ auto_resume:
222
+ desc: null
223
+ value: false
224
+ resume_from:
225
+ desc: null
226
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
227
+ resume_model_only:
228
+ desc: null
229
+ value: true
230
+ finetune_from_ema:
231
+ desc: null
232
+ value: true
233
+ finetune_from_hf:
234
+ desc: null
235
+ value: true
236
+ log_every:
237
+ desc: null
238
+ value: 1
239
+ save_every:
240
+ desc: null
241
+ value: 2500
242
+ total_steps:
243
+ desc: null
244
+ value: 5000
245
+ warmup_steps:
246
+ desc: null
247
+ value: 300
248
+ lr_scheduler:
249
+ desc: null
250
+ value: cosine
251
+ lr:
252
+ desc: null
253
+ value: 2.0e-05
254
+ min_lr:
255
+ desc: null
256
+ value: 1.0e-07
257
+ beta1:
258
+ desc: null
259
+ value: 0.9
260
+ beta2:
261
+ desc: null
262
+ value: 0.95
263
+ eps:
264
+ desc: null
265
+ value: 1.0e-15
266
+ ema:
267
+ desc: null
268
+ value: 0.993
269
+ max_grad_norm:
270
+ desc: null
271
+ value: 1.0
272
+ timestep_shift:
273
+ desc: null
274
+ value: 1.0
275
+ mse_weight:
276
+ desc: null
277
+ value: 1.0
278
+ ce_weight:
279
+ desc: null
280
+ value: 1.0
281
+ ce_loss_reweighting:
282
+ desc: null
283
+ value: false
284
+ expected_num_tokens:
285
+ desc: null
286
+ value: 40000
287
+ num_replicate:
288
+ desc: null
289
+ value: 1
290
+ num_shard:
291
+ desc: null
292
+ value: 8
293
+ sharding_strategy:
294
+ desc: null
295
+ value: HYBRID_SHARD
296
+ backward_prefetch:
297
+ desc: null
298
+ value: BACKWARD_PRE
299
+ cpu_offload:
300
+ desc: null
301
+ value: false
302
+ freeze_llm:
303
+ desc: null
304
+ value: false
305
+ freeze_vit:
306
+ desc: null
307
+ value: false
308
+ freeze_vae:
309
+ desc: null
310
+ value: true
311
+ freeze_und:
312
+ desc: null
313
+ value: false
314
+ copy_init_moe:
315
+ desc: null
316
+ value: true
317
+ use_flex:
318
+ desc: null
319
+ value: false
320
+ eval_every:
321
+ desc: null
322
+ value: 500
323
+ num_eval_batches:
324
+ desc: null
325
+ value: 20
326
+ use_ema_for_eval:
327
+ desc: null
328
+ value: true
329
+ eval_log_dir:
330
+ desc: null
331
+ value: null
332
+ eval_run_tag:
333
+ desc: null
334
+ value: ''
335
+ viz_every:
336
+ desc: null
337
+ value: 500
338
+ viz_n:
339
+ desc: null
340
+ value: 8
341
+ viz_outdir:
342
+ desc: null
343
+ value: results/viz
344
+ eval_dataset_config_file:
345
+ desc: null
346
+ value: ./data/configs/vlm_gym_reference_dot_eval_celoss.yaml
347
+ viz_dataset_config_file:
348
+ desc: null
349
+ value: ./data/configs/vlm_gym_reference_dot_eval_celoss.yaml
350
+ eval_print_n:
351
+ desc: null
352
+ value: 3
353
+ save_ema_only:
354
+ desc: null
355
+ value: true
356
+ save_optimizer:
357
+ desc: null
358
+ value: false
359
+ model_path:
360
+ desc: null
361
+ value: /home/clouduser/Code/Models/BAGEL-7B-MoT
362
+ llm_path:
363
+ desc: null
364
+ value: hf/Qwen2.5-0.5B-Instruct/
365
+ llm_qk_norm:
366
+ desc: null
367
+ value: true
368
+ tie_word_embeddings:
369
+ desc: null
370
+ value: false
371
+ layer_module:
372
+ desc: null
373
+ value: Qwen2MoTDecoderLayer
374
+ vae_path:
375
+ desc: null
376
+ value: flux/vae/ae.safetensors
377
+ vit_path:
378
+ desc: null
379
+ value: hf/siglip-so400m-14-980-flash-attn2-navit/
380
+ max_latent_size:
381
+ desc: null
382
+ value: 64
383
+ latent_patch_size:
384
+ desc: null
385
+ value: 2
386
+ vit_patch_size:
387
+ desc: null
388
+ value: 14
389
+ vit_max_num_patch_per_side:
390
+ desc: null
391
+ value: 70
392
+ connector_act:
393
+ desc: null
394
+ value: gelu_pytorch_tanh
395
+ interpolate_pos:
396
+ desc: null
397
+ value: false
398
+ vit_select_layer:
399
+ desc: null
400
+ value: -2
401
+ vit_rope:
402
+ desc: null
403
+ value: false
404
+ text_cond_dropout_prob:
405
+ desc: null
406
+ value: 0.0
407
+ vae_cond_dropout_prob:
408
+ desc: null
409
+ value: 0.3
410
+ vit_cond_dropout_prob:
411
+ desc: null
412
+ value: 0.0
413
+ dataset_config_file:
414
+ desc: null
415
+ value: ./data/configs/vlm_gym_reference_dot_train_celoss.yaml
416
+ train_data_dir:
417
+ desc: null
418
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/train/
419
+ train_jsonl_path:
420
+ desc: null
421
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/train/
422
+ eval_data_dir:
423
+ desc: null
424
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/val/
425
+ eval_jsonl_path:
426
+ desc: null
427
+ value: /home/clouduser/Code/data/gym/reference_dot_v5/val/
428
+ inference_hash_file:
429
+ desc: null
430
+ value: /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
431
+ task_name:
432
+ desc: null
433
+ value: reference_dot_v5
434
+ instructions_dir:
435
+ desc: null
436
+ value: ./data/instructions
437
+ prefetch_factor:
438
+ desc: null
439
+ value: 2
440
+ num_workers:
441
+ desc: null
442
+ value: 1
443
+ max_num_tokens_per_sample:
444
+ desc: null
445
+ value: 40000
446
+ max_num_tokens:
447
+ desc: null
448
+ value: 40000
449
+ prefer_buffer_before:
450
+ desc: null
451
+ value: 16384
452
+ max_buffer_size:
453
+ desc: null
454
+ value: 50
455
+ data_seed:
456
+ desc: null
457
+ value: 42
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/wandb/offline-run-20260130_175019-checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins-run0/files/output.log CHANGED
@@ -1153,27 +1153,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
1153
  [2026-01-30 19:17:34] (step=0000956) Train Loss mse: 0.0260, Train Loss ce: 0.3673, Train Steps/Sec: 0.22,
1154
  [2026-01-30 19:17:39] (step=0000957) Train Loss mse: 0.1040, Train Loss ce: 0.3833, Train Steps/Sec: 0.22,
1155
  [2026-01-30 19:17:45] (step=0000958) Train Loss mse: 0.0684, Train Loss ce: 0.3811, Train Steps/Sec: 0.16,
1156
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step1000
1157
- Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
1158
- [eval debug] first 3 batch fingerprints:
1159
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1160
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1161
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1162
- ce_avg: 0.38083067536354065, mse_avg: 0.0056973714381456375
1163
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step1500
1164
- Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
1165
- [eval debug] first 3 batch fingerprints:
1166
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1167
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1168
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1169
- ce_avg: 0.5134702324867249, mse_avg: 0.00566928181797266
1170
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step2000
1171
- Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
1172
- [eval debug] first 3 batch fingerprints:
1173
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1174
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1175
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1176
- ce_avg: 0.6963174939155579, mse_avg: 0.005763474386185408
1177
  [2026-01-30 19:17:50] (step=0000959) Train Loss mse: 0.0628, Train Loss ce: 0.3660, Train Steps/Sec: 0.20,
1178
  [2026-01-30 19:17:55] (step=0000960) Train Loss mse: 0.0377, Train Loss ce: 0.3751, Train Steps/Sec: 0.19,
1179
  [2026-01-30 19:18:00] (step=0000961) Train Loss mse: 0.0594, Train Loss ce: 0.3755, Train Steps/Sec: 0.20,
@@ -1272,6 +1251,27 @@ ce_avg: 0.6963174939155579, mse_avg: 0.005763474386185408
1272
  [2026-01-30 19:26:15] (step=0001054) Train Loss mse: 0.0187, Train Loss ce: 0.3781, Train Steps/Sec: 0.22,
1273
  [2026-01-30 19:26:19] (step=0001055) Train Loss mse: 0.0218, Train Loss ce: 0.3959, Train Steps/Sec: 0.21,
1274
  [2026-01-30 19:26:24] (step=0001056) Train Loss mse: 0.0648, Train Loss ce: 0.3802, Train Steps/Sec: 0.23,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1275
  [2026-01-30 19:26:30] (step=0001057) Train Loss mse: 0.1090, Train Loss ce: 0.3662, Train Steps/Sec: 0.17,
1276
  [2026-01-30 19:26:35] (step=0001058) Train Loss mse: 0.0627, Train Loss ce: 0.3800, Train Steps/Sec: 0.20,
1277
  [2026-01-30 19:26:39] (step=0001059) Train Loss mse: 0.0413, Train Loss ce: 0.3784, Train Steps/Sec: 0.24,
@@ -2565,20 +2565,6 @@ ce_avg: 0.6963174939155579, mse_avg: 0.005763474386185408
2565
  [2026-01-30 21:16:52] (step=0002347) Train Loss mse: 0.0233, Train Loss ce: 0.3611, Train Steps/Sec: 0.20,
2566
  [2026-01-30 21:16:57] (step=0002348) Train Loss mse: 0.0407, Train Loss ce: 0.3794, Train Steps/Sec: 0.21,
2567
  [2026-01-30 21:17:03] (step=0002349) Train Loss mse: 0.0832, Train Loss ce: 0.3697, Train Steps/Sec: 0.16,
2568
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step2500
2569
- Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
2570
- [eval debug] first 3 batch fingerprints:
2571
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2572
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2573
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2574
- ce_avg: 1.2680165767669678, mse_avg: 0.005764862522482872
2575
- base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step3000
2576
- Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
2577
- [eval debug] first 3 batch fingerprints:
2578
- fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2579
- fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2580
- fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2581
- ce_avg: 0.35950416326522827, mse_avg: 0.005150754004716873
2582
  [2026-01-30 21:17:08] (step=0002350) Train Loss mse: 0.0151, Train Loss ce: 0.3606, Train Steps/Sec: 0.21,
2583
  [2026-01-30 21:17:14] (step=0002351) Train Loss mse: 0.0258, Train Loss ce: 0.3678, Train Steps/Sec: 0.18,
2584
  [2026-01-30 21:17:20] (step=0002352) Train Loss mse: 0.0565, Train Loss ce: 0.3763, Train Steps/Sec: 0.16,
@@ -2598,6 +2584,20 @@ ce_avg: 0.35950416326522827, mse_avg: 0.005150754004716873
2598
  [2026-01-30 21:18:29] (step=0002366) Train Loss mse: 0.0676, Train Loss ce: 0.3779, Train Steps/Sec: 0.21,
2599
  [2026-01-30 21:18:33] (step=0002367) Train Loss mse: 0.0689, Train Loss ce: 0.3695, Train Steps/Sec: 0.22,
2600
  [2026-01-30 21:18:39] (step=0002368) Train Loss mse: 0.0200, Train Loss ce: 0.3593, Train Steps/Sec: 0.18,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2601
  [2026-01-30 21:18:44] (step=0002369) Train Loss mse: 0.0564, Train Loss ce: 0.3732, Train Steps/Sec: 0.21,
2602
  [2026-01-30 21:18:48] (step=0002370) Train Loss mse: 0.0158, Train Loss ce: 0.3686, Train Steps/Sec: 0.21,
2603
  [2026-01-30 21:18:53] (step=0002371) Train Loss mse: 0.0460, Train Loss ce: 0.3620, Train Steps/Sec: 0.21,
@@ -3577,6 +3577,14 @@ ce_avg: 0.35950416326522827, mse_avg: 0.005150754004716873
3577
  [2026-01-30 22:44:57] (step=0003342) Train Loss mse: 0.0204, Train Loss ce: 0.3625, Train Steps/Sec: 0.21,
3578
  [2026-01-30 22:45:02] (step=0003343) Train Loss mse: 0.0397, Train Loss ce: 0.3534, Train Steps/Sec: 0.21,
3579
  [2026-01-30 22:45:06] (step=0003344) Train Loss mse: 0.0457, Train Loss ce: 0.3492, Train Steps/Sec: 0.24,
 
 
 
 
 
 
 
 
3580
  base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step3500
3581
  Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
3582
  [eval debug] first 3 batch fingerprints:
@@ -3598,14 +3606,6 @@ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_va
3598
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
3599
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
3600
  ce_avg: 0.8022381663322449, mse_avg: 0.006051613483577967
3601
- [2026-01-30 22:45:11] (step=0003345) Train Loss mse: 0.0685, Train Loss ce: 0.3638, Train Steps/Sec: 0.22,
3602
- [2026-01-30 22:45:15] (step=0003346) Train Loss mse: 0.0287, Train Loss ce: 0.3681, Train Steps/Sec: 0.24,
3603
- [2026-01-30 22:45:20] (step=0003347) Train Loss mse: 0.0241, Train Loss ce: 0.3600, Train Steps/Sec: 0.22,
3604
- [2026-01-30 22:45:24] (step=0003348) Train Loss mse: 0.0670, Train Loss ce: 0.3696, Train Steps/Sec: 0.22,
3605
- [2026-01-30 22:45:29] (step=0003349) Train Loss mse: 0.0325, Train Loss ce: 0.3693, Train Steps/Sec: 0.21,
3606
- [2026-01-30 22:45:36] (step=0003350) Train Loss mse: 0.0331, Train Loss ce: 0.3672, Train Steps/Sec: 0.16,
3607
- [2026-01-30 22:45:40] (step=0003351) Train Loss mse: 0.0047, Train Loss ce: 0.3599, Train Steps/Sec: 0.21,
3608
- [2026-01-30 22:45:47] (step=0003352) Train Loss mse: 0.0047, Train Loss ce: 0.3617, Train Steps/Sec: 0.16,
3609
  [2026-01-30 22:45:52] (step=0003353) Train Loss mse: 0.0185, Train Loss ce: 0.3710, Train Steps/Sec: 0.19,
3610
  [2026-01-30 22:45:57] (step=0003354) Train Loss mse: 0.0042, Train Loss ce: 0.3512, Train Steps/Sec: 0.21,
3611
  [2026-01-30 22:46:02] (step=0003355) Train Loss mse: 0.0047, Train Loss ce: 0.3710, Train Steps/Sec: 0.18,
@@ -5079,11 +5079,6 @@ ce_avg: 0.8022381663322449, mse_avg: 0.006051613483577967
5079
  [2026-01-31 00:51:59] (step=0004823) Train Loss mse: 0.0137, Train Loss ce: 0.3659, Train Steps/Sec: 0.21,
5080
  [2026-01-31 00:52:04] (step=0004824) Train Loss mse: 0.0307, Train Loss ce: 0.3498, Train Steps/Sec: 0.20,
5081
  [2026-01-31 00:52:09] (step=0004825) Train Loss mse: 0.0056, Train Loss ce: 0.3591, Train Steps/Sec: 0.22,
5082
- [2026-01-31 00:52:13] (step=0004826) Train Loss mse: 0.0402, Train Loss ce: 0.3701, Train Steps/Sec: 0.23,
5083
- [2026-01-31 00:52:18] (step=0004827) Train Loss mse: 0.0730, Train Loss ce: 0.3659, Train Steps/Sec: 0.20,
5084
- [2026-01-31 00:52:22] (step=0004828) Train Loss mse: 0.0319, Train Loss ce: 0.3802, Train Steps/Sec: 0.22,
5085
- [2026-01-31 00:52:28] (step=0004829) Train Loss mse: 0.0325, Train Loss ce: 0.3511, Train Steps/Sec: 0.18,
5086
- [2026-01-31 00:52:34] (step=0004830) Train Loss mse: 0.0518, Train Loss ce: 0.3576, Train Steps/Sec: 0.18,
5087
  base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step5000
5088
  Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
5089
  [eval debug] first 3 batch fingerprints:
@@ -5091,6 +5086,11 @@ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_va
5091
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
5092
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
5093
  ce_avg: 1.4548426866531372, mse_avg: 0.005192228127270937
 
 
 
 
 
5094
  [2026-01-31 00:52:40] (step=0004831) Train Loss mse: 0.0038, Train Loss ce: 0.3513, Train Steps/Sec: 0.16,
5095
  [2026-01-31 00:52:45] (step=0004832) Train Loss mse: 0.0511, Train Loss ce: 0.3626, Train Steps/Sec: 0.19,
5096
  [2026-01-31 00:52:50] (step=0004833) Train Loss mse: 0.0457, Train Loss ce: 0.3571, Train Steps/Sec: 0.20,
 
1153
  [2026-01-30 19:17:34] (step=0000956) Train Loss mse: 0.0260, Train Loss ce: 0.3673, Train Steps/Sec: 0.22,
1154
  [2026-01-30 19:17:39] (step=0000957) Train Loss mse: 0.1040, Train Loss ce: 0.3833, Train Steps/Sec: 0.22,
1155
  [2026-01-30 19:17:45] (step=0000958) Train Loss mse: 0.0684, Train Loss ce: 0.3811, Train Steps/Sec: 0.16,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1156
  [2026-01-30 19:17:50] (step=0000959) Train Loss mse: 0.0628, Train Loss ce: 0.3660, Train Steps/Sec: 0.20,
1157
  [2026-01-30 19:17:55] (step=0000960) Train Loss mse: 0.0377, Train Loss ce: 0.3751, Train Steps/Sec: 0.19,
1158
  [2026-01-30 19:18:00] (step=0000961) Train Loss mse: 0.0594, Train Loss ce: 0.3755, Train Steps/Sec: 0.20,
 
1251
  [2026-01-30 19:26:15] (step=0001054) Train Loss mse: 0.0187, Train Loss ce: 0.3781, Train Steps/Sec: 0.22,
1252
  [2026-01-30 19:26:19] (step=0001055) Train Loss mse: 0.0218, Train Loss ce: 0.3959, Train Steps/Sec: 0.21,
1253
  [2026-01-30 19:26:24] (step=0001056) Train Loss mse: 0.0648, Train Loss ce: 0.3802, Train Steps/Sec: 0.23,
1254
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step1000
1255
+ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
1256
+ [eval debug] first 3 batch fingerprints:
1257
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1258
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1259
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1260
+ ce_avg: 0.38083067536354065, mse_avg: 0.0056973714381456375
1261
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step1500
1262
+ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
1263
+ [eval debug] first 3 batch fingerprints:
1264
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1265
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1266
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1267
+ ce_avg: 0.5134702324867249, mse_avg: 0.00566928181797266
1268
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step2000
1269
+ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
1270
+ [eval debug] first 3 batch fingerprints:
1271
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1272
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1273
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
1274
+ ce_avg: 0.6963174939155579, mse_avg: 0.005763474386185408
1275
  [2026-01-30 19:26:30] (step=0001057) Train Loss mse: 0.1090, Train Loss ce: 0.3662, Train Steps/Sec: 0.17,
1276
  [2026-01-30 19:26:35] (step=0001058) Train Loss mse: 0.0627, Train Loss ce: 0.3800, Train Steps/Sec: 0.20,
1277
  [2026-01-30 19:26:39] (step=0001059) Train Loss mse: 0.0413, Train Loss ce: 0.3784, Train Steps/Sec: 0.24,
 
2565
  [2026-01-30 21:16:52] (step=0002347) Train Loss mse: 0.0233, Train Loss ce: 0.3611, Train Steps/Sec: 0.20,
2566
  [2026-01-30 21:16:57] (step=0002348) Train Loss mse: 0.0407, Train Loss ce: 0.3794, Train Steps/Sec: 0.21,
2567
  [2026-01-30 21:17:03] (step=0002349) Train Loss mse: 0.0832, Train Loss ce: 0.3697, Train Steps/Sec: 0.16,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2568
  [2026-01-30 21:17:08] (step=0002350) Train Loss mse: 0.0151, Train Loss ce: 0.3606, Train Steps/Sec: 0.21,
2569
  [2026-01-30 21:17:14] (step=0002351) Train Loss mse: 0.0258, Train Loss ce: 0.3678, Train Steps/Sec: 0.18,
2570
  [2026-01-30 21:17:20] (step=0002352) Train Loss mse: 0.0565, Train Loss ce: 0.3763, Train Steps/Sec: 0.16,
 
2584
  [2026-01-30 21:18:29] (step=0002366) Train Loss mse: 0.0676, Train Loss ce: 0.3779, Train Steps/Sec: 0.21,
2585
  [2026-01-30 21:18:33] (step=0002367) Train Loss mse: 0.0689, Train Loss ce: 0.3695, Train Steps/Sec: 0.22,
2586
  [2026-01-30 21:18:39] (step=0002368) Train Loss mse: 0.0200, Train Loss ce: 0.3593, Train Steps/Sec: 0.18,
2587
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step2500
2588
+ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
2589
+ [eval debug] first 3 batch fingerprints:
2590
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2591
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2592
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2593
+ ce_avg: 1.2680165767669678, mse_avg: 0.005764862522482872
2594
+ base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step3000
2595
+ Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
2596
+ [eval debug] first 3 batch fingerprints:
2597
+ fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2598
+ fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2599
+ fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
2600
+ ce_avg: 0.35950416326522827, mse_avg: 0.005150754004716873
2601
  [2026-01-30 21:18:44] (step=0002369) Train Loss mse: 0.0564, Train Loss ce: 0.3732, Train Steps/Sec: 0.21,
2602
  [2026-01-30 21:18:48] (step=0002370) Train Loss mse: 0.0158, Train Loss ce: 0.3686, Train Steps/Sec: 0.21,
2603
  [2026-01-30 21:18:53] (step=0002371) Train Loss mse: 0.0460, Train Loss ce: 0.3620, Train Steps/Sec: 0.21,
 
3577
  [2026-01-30 22:44:57] (step=0003342) Train Loss mse: 0.0204, Train Loss ce: 0.3625, Train Steps/Sec: 0.21,
3578
  [2026-01-30 22:45:02] (step=0003343) Train Loss mse: 0.0397, Train Loss ce: 0.3534, Train Steps/Sec: 0.21,
3579
  [2026-01-30 22:45:06] (step=0003344) Train Loss mse: 0.0457, Train Loss ce: 0.3492, Train Steps/Sec: 0.24,
3580
+ [2026-01-30 22:45:11] (step=0003345) Train Loss mse: 0.0685, Train Loss ce: 0.3638, Train Steps/Sec: 0.22,
3581
+ [2026-01-30 22:45:15] (step=0003346) Train Loss mse: 0.0287, Train Loss ce: 0.3681, Train Steps/Sec: 0.24,
3582
+ [2026-01-30 22:45:20] (step=0003347) Train Loss mse: 0.0241, Train Loss ce: 0.3600, Train Steps/Sec: 0.22,
3583
+ [2026-01-30 22:45:24] (step=0003348) Train Loss mse: 0.0670, Train Loss ce: 0.3696, Train Steps/Sec: 0.22,
3584
+ [2026-01-30 22:45:29] (step=0003349) Train Loss mse: 0.0325, Train Loss ce: 0.3693, Train Steps/Sec: 0.21,
3585
+ [2026-01-30 22:45:36] (step=0003350) Train Loss mse: 0.0331, Train Loss ce: 0.3672, Train Steps/Sec: 0.16,
3586
+ [2026-01-30 22:45:40] (step=0003351) Train Loss mse: 0.0047, Train Loss ce: 0.3599, Train Steps/Sec: 0.21,
3587
+ [2026-01-30 22:45:47] (step=0003352) Train Loss mse: 0.0047, Train Loss ce: 0.3617, Train Steps/Sec: 0.16,
3588
  base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step3500
3589
  Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
3590
  [eval debug] first 3 batch fingerprints:
 
3606
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
3607
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
3608
  ce_avg: 0.8022381663322449, mse_avg: 0.006051613483577967
 
 
 
 
 
 
 
 
3609
  [2026-01-30 22:45:52] (step=0003353) Train Loss mse: 0.0185, Train Loss ce: 0.3710, Train Steps/Sec: 0.19,
3610
  [2026-01-30 22:45:57] (step=0003354) Train Loss mse: 0.0042, Train Loss ce: 0.3512, Train Steps/Sec: 0.21,
3611
  [2026-01-30 22:46:02] (step=0003355) Train Loss mse: 0.0047, Train Loss ce: 0.3710, Train Steps/Sec: 0.18,
 
5079
  [2026-01-31 00:51:59] (step=0004823) Train Loss mse: 0.0137, Train Loss ce: 0.3659, Train Steps/Sec: 0.21,
5080
  [2026-01-31 00:52:04] (step=0004824) Train Loss mse: 0.0307, Train Loss ce: 0.3498, Train Steps/Sec: 0.20,
5081
  [2026-01-31 00:52:09] (step=0004825) Train Loss mse: 0.0056, Train Loss ce: 0.3591, Train Steps/Sec: 0.22,
 
 
 
 
 
5082
  base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_ce_ins_step5000
5083
  Preparing Dataset vlm_gym_reference_dot_celoss_evalonce/vlm_gym_reference_dot_val
5084
  [eval debug] first 3 batch fingerprints:
 
5086
  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
5087
  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_celoss_evalonce'}]
5088
  ce_avg: 1.4548426866531372, mse_avg: 0.005192228127270937
5089
+ [2026-01-31 00:52:13] (step=0004826) Train Loss mse: 0.0402, Train Loss ce: 0.3701, Train Steps/Sec: 0.23,
5090
+ [2026-01-31 00:52:18] (step=0004827) Train Loss mse: 0.0730, Train Loss ce: 0.3659, Train Steps/Sec: 0.20,
5091
+ [2026-01-31 00:52:22] (step=0004828) Train Loss mse: 0.0319, Train Loss ce: 0.3802, Train Steps/Sec: 0.22,
5092
+ [2026-01-31 00:52:28] (step=0004829) Train Loss mse: 0.0325, Train Loss ce: 0.3511, Train Steps/Sec: 0.18,
5093
+ [2026-01-31 00:52:34] (step=0004830) Train Loss mse: 0.0518, Train Loss ce: 0.3576, Train Steps/Sec: 0.18,
5094
  [2026-01-31 00:52:40] (step=0004831) Train Loss mse: 0.0038, Train Loss ce: 0.3513, Train Steps/Sec: 0.16,
5095
  [2026-01-31 00:52:45] (step=0004832) Train Loss mse: 0.0511, Train Loss ce: 0.3626, Train Steps/Sec: 0.19,
5096
  [2026-01-31 00:52:50] (step=0004833) Train Loss mse: 0.0457, Train Loss ce: 0.3571, Train Steps/Sec: 0.20,