Upload checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
Browse files- checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/wandb/offline-run-20260129_223634-vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins-run0/files/config.yaml +457 -0
- checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/wandb/offline-run-20260129_223634-vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins-run0/files/output.log +216 -176
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/wandb/offline-run-20260129_223634-vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins-run0/files/config.yaml
CHANGED
|
@@ -0,0 +1,457 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
wandb_version: 1
|
| 2 |
+
|
| 3 |
+
_wandb:
|
| 4 |
+
desc: null
|
| 5 |
+
value:
|
| 6 |
+
python_version: 3.11.10
|
| 7 |
+
cli_version: 0.23.1
|
| 8 |
+
framework: huggingface
|
| 9 |
+
huggingface_version: 4.49.0
|
| 10 |
+
is_jupyter_run: false
|
| 11 |
+
is_kaggle_kernel: false
|
| 12 |
+
start_time: 1769726194
|
| 13 |
+
t:
|
| 14 |
+
1:
|
| 15 |
+
- 1
|
| 16 |
+
- 5
|
| 17 |
+
- 11
|
| 18 |
+
- 41
|
| 19 |
+
- 49
|
| 20 |
+
- 53
|
| 21 |
+
- 71
|
| 22 |
+
- 105
|
| 23 |
+
2:
|
| 24 |
+
- 1
|
| 25 |
+
- 5
|
| 26 |
+
- 11
|
| 27 |
+
- 41
|
| 28 |
+
- 49
|
| 29 |
+
- 53
|
| 30 |
+
- 71
|
| 31 |
+
- 105
|
| 32 |
+
3:
|
| 33 |
+
- 2
|
| 34 |
+
- 4
|
| 35 |
+
- 13
|
| 36 |
+
- 14
|
| 37 |
+
- 37
|
| 38 |
+
- 42
|
| 39 |
+
- 61
|
| 40 |
+
4: 3.11.10
|
| 41 |
+
5: 0.23.1
|
| 42 |
+
6: 4.49.0
|
| 43 |
+
13: linux-x86_64
|
| 44 |
+
e:
|
| 45 |
+
452u6eq6qj9cpdr80e5nue3sk0cejbit:
|
| 46 |
+
os: Linux-6.6.93+-x86_64-with-glibc2.35
|
| 47 |
+
python: CPython 3.11.10
|
| 48 |
+
started_at: '2026-01-29T22:36:34.516284Z'
|
| 49 |
+
args:
|
| 50 |
+
- --dataset_config_file
|
| 51 |
+
- ./data/configs/vlm_gym_reference_dot_train_mseloss_only.yaml
|
| 52 |
+
- --eval_dataset_config_file
|
| 53 |
+
- ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
|
| 54 |
+
- --viz_dataset_config_file
|
| 55 |
+
- ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
|
| 56 |
+
- --inference_hash_file
|
| 57 |
+
- /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
|
| 58 |
+
- --task_name
|
| 59 |
+
- reference_dot_v5
|
| 60 |
+
- --instructions_dir
|
| 61 |
+
- ./data/instructions
|
| 62 |
+
- --train_data_dir
|
| 63 |
+
- /home/clouduser/Code/data/gym/reference_dot_v5/train/
|
| 64 |
+
- --train_jsonl_path
|
| 65 |
+
- /home/clouduser/Code/data/gym/reference_dot_v5/train/
|
| 66 |
+
- --eval_data_dir
|
| 67 |
+
- /home/clouduser/Code/data/gym/reference_dot_v5/val/
|
| 68 |
+
- --eval_jsonl_path
|
| 69 |
+
- /home/clouduser/Code/data/gym/reference_dot_v5/val/
|
| 70 |
+
- --model_path
|
| 71 |
+
- /home/clouduser/Code/Models/BAGEL-7B-MoT
|
| 72 |
+
- --layer_module
|
| 73 |
+
- Qwen2MoTDecoderLayer
|
| 74 |
+
- --max_latent_size
|
| 75 |
+
- '64'
|
| 76 |
+
- --resume-from
|
| 77 |
+
- /home/clouduser/Code/Models/BAGEL-7B-MoT
|
| 78 |
+
- --finetune_from_hf
|
| 79 |
+
- 'True'
|
| 80 |
+
- --auto_resume
|
| 81 |
+
- 'False'
|
| 82 |
+
- --resume-model-only
|
| 83 |
+
- 'True'
|
| 84 |
+
- --finetune-from-ema
|
| 85 |
+
- 'True'
|
| 86 |
+
- --log_every
|
| 87 |
+
- '1'
|
| 88 |
+
- --lr
|
| 89 |
+
- 2e-5
|
| 90 |
+
- --warmup_steps
|
| 91 |
+
- '300'
|
| 92 |
+
- --lr_scheduler
|
| 93 |
+
- cosine
|
| 94 |
+
- --num_worker
|
| 95 |
+
- '1'
|
| 96 |
+
- --expected_num_tokens
|
| 97 |
+
- '30000'
|
| 98 |
+
- --max_num_tokens
|
| 99 |
+
- '30000'
|
| 100 |
+
- --max_num_tokens_per_sample
|
| 101 |
+
- '30000'
|
| 102 |
+
- --visual_und
|
| 103 |
+
- 'True'
|
| 104 |
+
- --save_every
|
| 105 |
+
- '5000'
|
| 106 |
+
- --total_steps
|
| 107 |
+
- '5000'
|
| 108 |
+
- --text_cond_dropout_prob
|
| 109 |
+
- '0.0'
|
| 110 |
+
- --vae_cond_dropout_prob
|
| 111 |
+
- '0.0'
|
| 112 |
+
- --vit_cond_dropout_prob
|
| 113 |
+
- '0.0'
|
| 114 |
+
- --ema
|
| 115 |
+
- '0.993'
|
| 116 |
+
- --checkpoint_dir
|
| 117 |
+
- /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
|
| 118 |
+
- --wandb_project
|
| 119 |
+
- bagel
|
| 120 |
+
- --wandb_name
|
| 121 |
+
- vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins
|
| 122 |
+
- --wandb_dir
|
| 123 |
+
- /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
|
| 124 |
+
- --wandb_offline
|
| 125 |
+
- 'True'
|
| 126 |
+
program: /home/clouduser/Code/Github/unified_world_model/train/pretrain_unified_navit.py
|
| 127 |
+
code_path: train/pretrain_unified_navit.py
|
| 128 |
+
code_path_local: train/pretrain_unified_navit.py
|
| 129 |
+
git:
|
| 130 |
+
remote_url: https://github.com/para-lost/unified_world_model
|
| 131 |
+
commit: 8d7b26b7e552fc87b592cf3be94d93be7aeca2a9
|
| 132 |
+
root: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
|
| 133 |
+
host: junyizhang-launch-new-226786220-1-0
|
| 134 |
+
executable: /opt/conda/bin/python3.11
|
| 135 |
+
cpu_count: 48
|
| 136 |
+
cpu_count_logical: 96
|
| 137 |
+
gpu_type: NVIDIA A100-SXM4-80GB
|
| 138 |
+
gpu_count: 8
|
| 139 |
+
disk:
|
| 140 |
+
/:
|
| 141 |
+
total: '1052461830144'
|
| 142 |
+
used: '164465111040'
|
| 143 |
+
memory:
|
| 144 |
+
total: '1437332606976'
|
| 145 |
+
gpu_nvidia:
|
| 146 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 147 |
+
memory_total: '85899345920'
|
| 148 |
+
cuda_cores: 6912
|
| 149 |
+
architecture: Ampere
|
| 150 |
+
uuid: GPU-6743ee30-8d44-3700-5053-982b7634dc72
|
| 151 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 152 |
+
memory_total: '85899345920'
|
| 153 |
+
cuda_cores: 6912
|
| 154 |
+
architecture: Ampere
|
| 155 |
+
uuid: GPU-efdba308-be1c-fb5c-98af-5161d08502b8
|
| 156 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 157 |
+
memory_total: '85899345920'
|
| 158 |
+
cuda_cores: 6912
|
| 159 |
+
architecture: Ampere
|
| 160 |
+
uuid: GPU-7a081892-5d05-b817-0e72-25e6d0f10d7a
|
| 161 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 162 |
+
memory_total: '85899345920'
|
| 163 |
+
cuda_cores: 6912
|
| 164 |
+
architecture: Ampere
|
| 165 |
+
uuid: GPU-9be64e16-04d7-3a0f-eac3-d2b19251109b
|
| 166 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 167 |
+
memory_total: '85899345920'
|
| 168 |
+
cuda_cores: 6912
|
| 169 |
+
architecture: Ampere
|
| 170 |
+
uuid: GPU-119e1d42-0ee4-815a-6fbf-d61349b517ef
|
| 171 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 172 |
+
memory_total: '85899345920'
|
| 173 |
+
cuda_cores: 6912
|
| 174 |
+
architecture: Ampere
|
| 175 |
+
uuid: GPU-54053246-0cf6-b199-d31d-12946eb5de68
|
| 176 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 177 |
+
memory_total: '85899345920'
|
| 178 |
+
cuda_cores: 6912
|
| 179 |
+
architecture: Ampere
|
| 180 |
+
uuid: GPU-3e4b216a-3bc7-064f-197e-5d85c243b0d3
|
| 181 |
+
- name: NVIDIA A100-SXM4-80GB
|
| 182 |
+
memory_total: '85899345920'
|
| 183 |
+
cuda_cores: 6912
|
| 184 |
+
architecture: Ampere
|
| 185 |
+
uuid: GPU-58fb92a9-aa05-fe07-638f-be3c40ba2436
|
| 186 |
+
cuda_version: '12.2'
|
| 187 |
+
writer_id: 452u6eq6qj9cpdr80e5nue3sk0cejbit
|
| 188 |
+
visual_gen:
|
| 189 |
+
desc: null
|
| 190 |
+
value: true
|
| 191 |
+
visual_und:
|
| 192 |
+
desc: null
|
| 193 |
+
value: true
|
| 194 |
+
results_dir:
|
| 195 |
+
desc: null
|
| 196 |
+
value: results
|
| 197 |
+
checkpoint_dir:
|
| 198 |
+
desc: null
|
| 199 |
+
value: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
|
| 200 |
+
wandb_project:
|
| 201 |
+
desc: null
|
| 202 |
+
value: bagel
|
| 203 |
+
wandb_name:
|
| 204 |
+
desc: null
|
| 205 |
+
value: vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins
|
| 206 |
+
wandb_runid:
|
| 207 |
+
desc: null
|
| 208 |
+
value: '0'
|
| 209 |
+
wandb_resume:
|
| 210 |
+
desc: null
|
| 211 |
+
value: allow
|
| 212 |
+
wandb_offline:
|
| 213 |
+
desc: null
|
| 214 |
+
value: true
|
| 215 |
+
wandb_dir:
|
| 216 |
+
desc: null
|
| 217 |
+
value: /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins
|
| 218 |
+
global_seed:
|
| 219 |
+
desc: null
|
| 220 |
+
value: 4396
|
| 221 |
+
auto_resume:
|
| 222 |
+
desc: null
|
| 223 |
+
value: false
|
| 224 |
+
resume_from:
|
| 225 |
+
desc: null
|
| 226 |
+
value: /home/clouduser/Code/Models/BAGEL-7B-MoT
|
| 227 |
+
resume_model_only:
|
| 228 |
+
desc: null
|
| 229 |
+
value: true
|
| 230 |
+
finetune_from_ema:
|
| 231 |
+
desc: null
|
| 232 |
+
value: true
|
| 233 |
+
finetune_from_hf:
|
| 234 |
+
desc: null
|
| 235 |
+
value: true
|
| 236 |
+
log_every:
|
| 237 |
+
desc: null
|
| 238 |
+
value: 1
|
| 239 |
+
save_every:
|
| 240 |
+
desc: null
|
| 241 |
+
value: 5000
|
| 242 |
+
total_steps:
|
| 243 |
+
desc: null
|
| 244 |
+
value: 5000
|
| 245 |
+
warmup_steps:
|
| 246 |
+
desc: null
|
| 247 |
+
value: 300
|
| 248 |
+
lr_scheduler:
|
| 249 |
+
desc: null
|
| 250 |
+
value: cosine
|
| 251 |
+
lr:
|
| 252 |
+
desc: null
|
| 253 |
+
value: 2.0e-05
|
| 254 |
+
min_lr:
|
| 255 |
+
desc: null
|
| 256 |
+
value: 1.0e-07
|
| 257 |
+
beta1:
|
| 258 |
+
desc: null
|
| 259 |
+
value: 0.9
|
| 260 |
+
beta2:
|
| 261 |
+
desc: null
|
| 262 |
+
value: 0.95
|
| 263 |
+
eps:
|
| 264 |
+
desc: null
|
| 265 |
+
value: 1.0e-15
|
| 266 |
+
ema:
|
| 267 |
+
desc: null
|
| 268 |
+
value: 0.993
|
| 269 |
+
max_grad_norm:
|
| 270 |
+
desc: null
|
| 271 |
+
value: 1.0
|
| 272 |
+
timestep_shift:
|
| 273 |
+
desc: null
|
| 274 |
+
value: 1.0
|
| 275 |
+
mse_weight:
|
| 276 |
+
desc: null
|
| 277 |
+
value: 1.0
|
| 278 |
+
ce_weight:
|
| 279 |
+
desc: null
|
| 280 |
+
value: 1.0
|
| 281 |
+
ce_loss_reweighting:
|
| 282 |
+
desc: null
|
| 283 |
+
value: false
|
| 284 |
+
expected_num_tokens:
|
| 285 |
+
desc: null
|
| 286 |
+
value: 30000
|
| 287 |
+
num_replicate:
|
| 288 |
+
desc: null
|
| 289 |
+
value: 1
|
| 290 |
+
num_shard:
|
| 291 |
+
desc: null
|
| 292 |
+
value: 8
|
| 293 |
+
sharding_strategy:
|
| 294 |
+
desc: null
|
| 295 |
+
value: HYBRID_SHARD
|
| 296 |
+
backward_prefetch:
|
| 297 |
+
desc: null
|
| 298 |
+
value: BACKWARD_PRE
|
| 299 |
+
cpu_offload:
|
| 300 |
+
desc: null
|
| 301 |
+
value: false
|
| 302 |
+
freeze_llm:
|
| 303 |
+
desc: null
|
| 304 |
+
value: false
|
| 305 |
+
freeze_vit:
|
| 306 |
+
desc: null
|
| 307 |
+
value: false
|
| 308 |
+
freeze_vae:
|
| 309 |
+
desc: null
|
| 310 |
+
value: true
|
| 311 |
+
freeze_und:
|
| 312 |
+
desc: null
|
| 313 |
+
value: false
|
| 314 |
+
copy_init_moe:
|
| 315 |
+
desc: null
|
| 316 |
+
value: true
|
| 317 |
+
use_flex:
|
| 318 |
+
desc: null
|
| 319 |
+
value: false
|
| 320 |
+
eval_every:
|
| 321 |
+
desc: null
|
| 322 |
+
value: 500
|
| 323 |
+
num_eval_batches:
|
| 324 |
+
desc: null
|
| 325 |
+
value: 20
|
| 326 |
+
use_ema_for_eval:
|
| 327 |
+
desc: null
|
| 328 |
+
value: true
|
| 329 |
+
eval_log_dir:
|
| 330 |
+
desc: null
|
| 331 |
+
value: null
|
| 332 |
+
eval_run_tag:
|
| 333 |
+
desc: null
|
| 334 |
+
value: ''
|
| 335 |
+
viz_every:
|
| 336 |
+
desc: null
|
| 337 |
+
value: 500
|
| 338 |
+
viz_n:
|
| 339 |
+
desc: null
|
| 340 |
+
value: 8
|
| 341 |
+
viz_outdir:
|
| 342 |
+
desc: null
|
| 343 |
+
value: results/viz
|
| 344 |
+
eval_dataset_config_file:
|
| 345 |
+
desc: null
|
| 346 |
+
value: ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
|
| 347 |
+
viz_dataset_config_file:
|
| 348 |
+
desc: null
|
| 349 |
+
value: ./data/configs/vlm_gym_reference_dot_eval_mseloss_only.yaml
|
| 350 |
+
eval_print_n:
|
| 351 |
+
desc: null
|
| 352 |
+
value: 3
|
| 353 |
+
save_ema_only:
|
| 354 |
+
desc: null
|
| 355 |
+
value: true
|
| 356 |
+
save_optimizer:
|
| 357 |
+
desc: null
|
| 358 |
+
value: false
|
| 359 |
+
model_path:
|
| 360 |
+
desc: null
|
| 361 |
+
value: /home/clouduser/Code/Models/BAGEL-7B-MoT
|
| 362 |
+
llm_path:
|
| 363 |
+
desc: null
|
| 364 |
+
value: hf/Qwen2.5-0.5B-Instruct/
|
| 365 |
+
llm_qk_norm:
|
| 366 |
+
desc: null
|
| 367 |
+
value: true
|
| 368 |
+
tie_word_embeddings:
|
| 369 |
+
desc: null
|
| 370 |
+
value: false
|
| 371 |
+
layer_module:
|
| 372 |
+
desc: null
|
| 373 |
+
value: Qwen2MoTDecoderLayer
|
| 374 |
+
vae_path:
|
| 375 |
+
desc: null
|
| 376 |
+
value: flux/vae/ae.safetensors
|
| 377 |
+
vit_path:
|
| 378 |
+
desc: null
|
| 379 |
+
value: hf/siglip-so400m-14-980-flash-attn2-navit/
|
| 380 |
+
max_latent_size:
|
| 381 |
+
desc: null
|
| 382 |
+
value: 64
|
| 383 |
+
latent_patch_size:
|
| 384 |
+
desc: null
|
| 385 |
+
value: 2
|
| 386 |
+
vit_patch_size:
|
| 387 |
+
desc: null
|
| 388 |
+
value: 14
|
| 389 |
+
vit_max_num_patch_per_side:
|
| 390 |
+
desc: null
|
| 391 |
+
value: 70
|
| 392 |
+
connector_act:
|
| 393 |
+
desc: null
|
| 394 |
+
value: gelu_pytorch_tanh
|
| 395 |
+
interpolate_pos:
|
| 396 |
+
desc: null
|
| 397 |
+
value: false
|
| 398 |
+
vit_select_layer:
|
| 399 |
+
desc: null
|
| 400 |
+
value: -2
|
| 401 |
+
vit_rope:
|
| 402 |
+
desc: null
|
| 403 |
+
value: false
|
| 404 |
+
text_cond_dropout_prob:
|
| 405 |
+
desc: null
|
| 406 |
+
value: 0.0
|
| 407 |
+
vae_cond_dropout_prob:
|
| 408 |
+
desc: null
|
| 409 |
+
value: 0.0
|
| 410 |
+
vit_cond_dropout_prob:
|
| 411 |
+
desc: null
|
| 412 |
+
value: 0.0
|
| 413 |
+
dataset_config_file:
|
| 414 |
+
desc: null
|
| 415 |
+
value: ./data/configs/vlm_gym_reference_dot_train_mseloss_only.yaml
|
| 416 |
+
train_data_dir:
|
| 417 |
+
desc: null
|
| 418 |
+
value: /home/clouduser/Code/data/gym/reference_dot_v5/train/
|
| 419 |
+
train_jsonl_path:
|
| 420 |
+
desc: null
|
| 421 |
+
value: /home/clouduser/Code/data/gym/reference_dot_v5/train/
|
| 422 |
+
eval_data_dir:
|
| 423 |
+
desc: null
|
| 424 |
+
value: /home/clouduser/Code/data/gym/reference_dot_v5/val/
|
| 425 |
+
eval_jsonl_path:
|
| 426 |
+
desc: null
|
| 427 |
+
value: /home/clouduser/Code/data/gym/reference_dot_v5/val/
|
| 428 |
+
inference_hash_file:
|
| 429 |
+
desc: null
|
| 430 |
+
value: /home/clouduser/Code/Github/launch_new/hashes_test_set_v10.json
|
| 431 |
+
task_name:
|
| 432 |
+
desc: null
|
| 433 |
+
value: reference_dot_v5
|
| 434 |
+
instructions_dir:
|
| 435 |
+
desc: null
|
| 436 |
+
value: ./data/instructions
|
| 437 |
+
prefetch_factor:
|
| 438 |
+
desc: null
|
| 439 |
+
value: 2
|
| 440 |
+
num_workers:
|
| 441 |
+
desc: null
|
| 442 |
+
value: 1
|
| 443 |
+
max_num_tokens_per_sample:
|
| 444 |
+
desc: null
|
| 445 |
+
value: 30000
|
| 446 |
+
max_num_tokens:
|
| 447 |
+
desc: null
|
| 448 |
+
value: 30000
|
| 449 |
+
prefer_buffer_before:
|
| 450 |
+
desc: null
|
| 451 |
+
value: 16384
|
| 452 |
+
max_buffer_size:
|
| 453 |
+
desc: null
|
| 454 |
+
value: 50
|
| 455 |
+
data_seed:
|
| 456 |
+
desc: null
|
| 457 |
+
value: 42
|
checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/wandb/offline-run-20260129_223634-vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins-run0/files/output.log
CHANGED
|
@@ -928,126 +928,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 928 |
[[34m2026-01-30 00:05:37[39m] (step=0000917) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 929 |
[[34m2026-01-30 00:05:41[39m] (step=0000918) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 930 |
[[34m2026-01-30 00:05:46[39m] (step=0000919) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 931 |
-
[[34m2026-01-30 00:05:52[39m] (step=0000920) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 932 |
-
[[34m2026-01-30 00:05:59[39m] (step=0000921) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 933 |
-
[[34m2026-01-30 00:06:05[39m] (step=0000922) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 934 |
-
[[34m2026-01-30 00:06:10[39m] (step=0000923) Train Loss mse: 0.0113, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 935 |
-
[[34m2026-01-30 00:06:15[39m] (step=0000924) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 936 |
-
[[34m2026-01-30 00:06:21[39m] (step=0000925) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 937 |
-
[[34m2026-01-30 00:06:27[39m] (step=0000926) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 938 |
-
[[34m2026-01-30 00:06:32[39m] (step=0000927) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 939 |
-
[[34m2026-01-30 00:06:37[39m] (step=0000928) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 940 |
-
[[34m2026-01-30 00:06:42[39m] (step=0000929) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 941 |
-
[[34m2026-01-30 00:06:46[39m] (step=0000930) Train Loss mse: 0.0122, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 942 |
-
[[34m2026-01-30 00:06:51[39m] (step=0000931) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 943 |
-
[[34m2026-01-30 00:06:57[39m] (step=0000932) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 944 |
-
[[34m2026-01-30 00:07:02[39m] (step=0000933) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 945 |
-
[[34m2026-01-30 00:07:07[39m] (step=0000934) Train Loss mse: 0.0071, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 946 |
-
[[34m2026-01-30 00:07:13[39m] (step=0000935) Train Loss mse: 0.0064, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 947 |
-
[[34m2026-01-30 00:07:18[39m] (step=0000936) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 948 |
-
[[34m2026-01-30 00:07:24[39m] (step=0000937) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 949 |
-
[[34m2026-01-30 00:07:31[39m] (step=0000938) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 950 |
-
[[34m2026-01-30 00:07:35[39m] (step=0000939) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 951 |
-
[[34m2026-01-30 00:07:40[39m] (step=0000940) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 952 |
-
[[34m2026-01-30 00:07:45[39m] (step=0000941) Train Loss mse: 0.0077, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 953 |
-
[[34m2026-01-30 00:07:51[39m] (step=0000942) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 954 |
-
[[34m2026-01-30 00:07:56[39m] (step=0000943) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 955 |
-
[[34m2026-01-30 00:08:02[39m] (step=0000944) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 956 |
-
[[34m2026-01-30 00:08:08[39m] (step=0000945) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 957 |
-
[[34m2026-01-30 00:08:13[39m] (step=0000946) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 958 |
-
[[34m2026-01-30 00:08:19[39m] (step=0000947) Train Loss mse: 0.0058, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 959 |
-
[[34m2026-01-30 00:08:25[39m] (step=0000948) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 960 |
-
[[34m2026-01-30 00:08:30[39m] (step=0000949) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 961 |
-
[[34m2026-01-30 00:08:35[39m] (step=0000950) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 962 |
-
[[34m2026-01-30 00:08:41[39m] (step=0000951) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 963 |
-
[[34m2026-01-30 00:08:46[39m] (step=0000952) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 964 |
-
[[34m2026-01-30 00:08:52[39m] (step=0000953) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 965 |
-
[[34m2026-01-30 00:08:58[39m] (step=0000954) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 966 |
-
[[34m2026-01-30 00:09:04[39m] (step=0000955) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 967 |
-
[[34m2026-01-30 00:09:09[39m] (step=0000956) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 968 |
-
[[34m2026-01-30 00:09:13[39m] (step=0000957) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 969 |
-
[[34m2026-01-30 00:09:20[39m] (step=0000958) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 970 |
-
[[34m2026-01-30 00:09:26[39m] (step=0000959) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 971 |
-
[[34m2026-01-30 00:09:31[39m] (step=0000960) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 972 |
-
[[34m2026-01-30 00:09:36[39m] (step=0000961) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 973 |
-
[[34m2026-01-30 00:09:42[39m] (step=0000962) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 974 |
-
[[34m2026-01-30 00:09:48[39m] (step=0000963) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 975 |
-
[[34m2026-01-30 00:09:53[39m] (step=0000964) Train Loss mse: 0.0070, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 976 |
-
[[34m2026-01-30 00:09:58[39m] (step=0000965) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 977 |
-
[[34m2026-01-30 00:10:04[39m] (step=0000966) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 978 |
-
[[34m2026-01-30 00:10:09[39m] (step=0000967) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 979 |
-
[[34m2026-01-30 00:10:14[39m] (step=0000968) Train Loss mse: 0.0074, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 980 |
-
[[34m2026-01-30 00:10:19[39m] (step=0000969) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 981 |
-
[[34m2026-01-30 00:10:25[39m] (step=0000970) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 982 |
-
[[34m2026-01-30 00:10:30[39m] (step=0000971) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 983 |
-
[[34m2026-01-30 00:10:35[39m] (step=0000972) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 984 |
-
[[34m2026-01-30 00:10:41[39m] (step=0000973) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 985 |
-
[[34m2026-01-30 00:10:48[39m] (step=0000974) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 986 |
-
[[34m2026-01-30 00:10:54[39m] (step=0000975) Train Loss mse: 0.0084, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 987 |
-
[[34m2026-01-30 00:11:00[39m] (step=0000976) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 988 |
-
[[34m2026-01-30 00:11:05[39m] (step=0000977) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 989 |
-
[[34m2026-01-30 00:11:09[39m] (step=0000978) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 990 |
-
[[34m2026-01-30 00:11:14[39m] (step=0000979) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 991 |
-
[[34m2026-01-30 00:11:20[39m] (step=0000980) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 992 |
-
[[34m2026-01-30 00:11:26[39m] (step=0000981) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 993 |
-
[[34m2026-01-30 00:11:32[39m] (step=0000982) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 994 |
-
[[34m2026-01-30 00:11:38[39m] (step=0000983) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 995 |
-
[[34m2026-01-30 00:11:43[39m] (step=0000984) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 996 |
-
[[34m2026-01-30 00:11:47[39m] (step=0000985) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 997 |
-
[[34m2026-01-30 00:11:52[39m] (step=0000986) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 998 |
-
[[34m2026-01-30 00:11:58[39m] (step=0000987) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 999 |
-
[[34m2026-01-30 00:12:04[39m] (step=0000988) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1000 |
-
[[34m2026-01-30 00:12:09[39m] (step=0000989) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1001 |
-
[[34m2026-01-30 00:12:14[39m] (step=0000990) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1002 |
-
[[34m2026-01-30 00:12:18[39m] (step=0000991) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1003 |
-
[[34m2026-01-30 00:12:25[39m] (step=0000992) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1004 |
-
[[34m2026-01-30 00:12:31[39m] (step=0000993) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1005 |
-
[[34m2026-01-30 00:12:36[39m] (step=0000994) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1006 |
-
[[34m2026-01-30 00:12:41[39m] (step=0000995) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1007 |
-
[[34m2026-01-30 00:12:46[39m] (step=0000996) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1008 |
-
[[34m2026-01-30 00:12:52[39m] (step=0000997) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1009 |
-
[[34m2026-01-30 00:12:57[39m] (step=0000998) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1010 |
-
[[34m2026-01-30 00:13:02[39m] (step=0000999) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1011 |
-
[[34m2026-01-30 00:13:33[39m] (step=0001000) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.03,
|
| 1012 |
-
[[34m2026-01-30 00:13:37[39m] (step=0001001) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1013 |
-
[[34m2026-01-30 00:13:43[39m] (step=0001002) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1014 |
-
[[34m2026-01-30 00:13:50[39m] (step=0001003) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1015 |
-
[[34m2026-01-30 00:13:56[39m] (step=0001004) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1016 |
-
[[34m2026-01-30 00:14:02[39m] (step=0001005) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1017 |
-
[[34m2026-01-30 00:14:08[39m] (step=0001006) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1018 |
-
[[34m2026-01-30 00:14:13[39m] (step=0001007) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1019 |
-
[[34m2026-01-30 00:14:20[39m] (step=0001008) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1020 |
-
[[34m2026-01-30 00:14:24[39m] (step=0001009) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1021 |
-
[[34m2026-01-30 00:14:29[39m] (step=0001010) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1022 |
-
[[34m2026-01-30 00:14:33[39m] (step=0001011) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1023 |
-
[[34m2026-01-30 00:14:37[39m] (step=0001012) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.25,
|
| 1024 |
-
[[34m2026-01-30 00:14:44[39m] (step=0001013) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1025 |
-
[[34m2026-01-30 00:14:50[39m] (step=0001014) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1026 |
-
[[34m2026-01-30 00:14:55[39m] (step=0001015) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1027 |
-
[[34m2026-01-30 00:15:01[39m] (step=0001016) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1028 |
-
[[34m2026-01-30 00:15:07[39m] (step=0001017) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1029 |
-
[[34m2026-01-30 00:15:12[39m] (step=0001018) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1030 |
-
[[34m2026-01-30 00:15:18[39m] (step=0001019) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1031 |
-
[[34m2026-01-30 00:15:23[39m] (step=0001020) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1032 |
-
[[34m2026-01-30 00:15:29[39m] (step=0001021) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1033 |
-
[[34m2026-01-30 00:15:35[39m] (step=0001022) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1034 |
-
[[34m2026-01-30 00:15:42[39m] (step=0001023) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1035 |
-
[[34m2026-01-30 00:15:48[39m] (step=0001024) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1036 |
-
[[34m2026-01-30 00:15:53[39m] (step=0001025) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1037 |
-
[[34m2026-01-30 00:15:58[39m] (step=0001026) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1038 |
-
[[34m2026-01-30 00:16:03[39m] (step=0001027) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1039 |
-
[[34m2026-01-30 00:16:09[39m] (step=0001028) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1040 |
-
[[34m2026-01-30 00:16:16[39m] (step=0001029) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1041 |
-
[[34m2026-01-30 00:16:21[39m] (step=0001030) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 1042 |
-
[[34m2026-01-30 00:16:26[39m] (step=0001031) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1043 |
-
[[34m2026-01-30 00:16:32[39m] (step=0001032) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1044 |
-
[[34m2026-01-30 00:16:36[39m] (step=0001033) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1045 |
-
[[34m2026-01-30 00:16:43[39m] (step=0001034) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1046 |
-
[[34m2026-01-30 00:16:48[39m] (step=0001035) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1047 |
-
[[34m2026-01-30 00:16:52[39m] (step=0001036) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.23,
|
| 1048 |
-
[[34m2026-01-30 00:16:57[39m] (step=0001037) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1049 |
-
[[34m2026-01-30 00:17:03[39m] (step=0001038) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1050 |
-
[[34m2026-01-30 00:17:09[39m] (step=0001039) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1051 |
FullyShardedDataParallel(
|
| 1052 |
(_fsdp_wrapped_module): Bagel(
|
| 1053 |
(language_model): Qwen2ForCausalLM(
|
|
@@ -1234,13 +1114,6 @@ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference
|
|
| 1234 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1235 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1236 |
ce_avg: 0.0, mse_avg: 0.0055910381488502026
|
| 1237 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step1000
|
| 1238 |
-
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 1239 |
-
[eval debug] first 3 batch fingerprints:
|
| 1240 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1241 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1242 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1243 |
-
ce_avg: 0.0, mse_avg: 0.005657645873725414
|
| 1244 |
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step1500
|
| 1245 |
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 1246 |
[eval debug] first 3 batch fingerprints:
|
|
@@ -1248,13 +1121,126 @@ Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference
|
|
| 1248 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1249 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1250 |
ce_avg: 0.0, mse_avg: 0.0055648270063102245
|
| 1251 |
-
|
| 1252 |
-
|
| 1253 |
-
[
|
| 1254 |
-
|
| 1255 |
-
|
| 1256 |
-
|
| 1257 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1258 |
[[34m2026-01-30 00:17:14[39m] (step=0001040) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1259 |
[[34m2026-01-30 00:17:19[39m] (step=0001041) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1260 |
[[34m2026-01-30 00:17:24[39m] (step=0001042) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
@@ -2518,6 +2504,27 @@ ce_avg: 0.0, mse_avg: 0.0056802802719175816
|
|
| 2518 |
[[34m2026-01-30 02:13:09[39m] (step=0002300) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 2519 |
[[34m2026-01-30 02:13:14[39m] (step=0002301) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 2520 |
[[34m2026-01-30 02:13:20[39m] (step=0002302) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2521 |
[[34m2026-01-30 02:13:25[39m] (step=0002303) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 2522 |
[[34m2026-01-30 02:13:30[39m] (step=0002304) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 2523 |
[[34m2026-01-30 02:13:36[39m] (step=0002305) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
|
@@ -2621,20 +2628,6 @@ ce_avg: 0.0, mse_avg: 0.0056802802719175816
|
|
| 2621 |
[[34m2026-01-30 02:22:40[39m] (step=0002403) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 2622 |
[[34m2026-01-30 02:22:45[39m] (step=0002404) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 2623 |
[[34m2026-01-30 02:22:50[39m] (step=0002405) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 2624 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2500
|
| 2625 |
-
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 2626 |
-
[eval debug] first 3 batch fingerprints:
|
| 2627 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2628 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2629 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2630 |
-
ce_avg: 0.0, mse_avg: 0.005605650134384632
|
| 2631 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3000
|
| 2632 |
-
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 2633 |
-
[eval debug] first 3 batch fingerprints:
|
| 2634 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2635 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2636 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2637 |
-
ce_avg: 0.0, mse_avg: 0.005620267707854509
|
| 2638 |
[[34m2026-01-30 02:22:56[39m] (step=0002406) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 2639 |
[[34m2026-01-30 02:23:01[39m] (step=0002407) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 2640 |
[[34m2026-01-30 02:23:06[39m] (step=0002408) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
|
@@ -3549,6 +3542,27 @@ ce_avg: 0.0, mse_avg: 0.005620267707854509
|
|
| 3549 |
[[34m2026-01-30 03:46:48[39m] (step=0003317) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3550 |
[[34m2026-01-30 03:46:53[39m] (step=0003318) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3551 |
[[34m2026-01-30 03:46:59[39m] (step=0003319) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3552 |
[[34m2026-01-30 03:47:05[39m] (step=0003320) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3553 |
[[34m2026-01-30 03:47:11[39m] (step=0003321) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3554 |
[[34m2026-01-30 03:47:16[39m] (step=0003322) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
@@ -3595,27 +3609,53 @@ ce_avg: 0.0, mse_avg: 0.005620267707854509
|
|
| 3595 |
[[34m2026-01-30 03:51:02[39m] (step=0003363) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3596 |
[[34m2026-01-30 03:51:07[39m] (step=0003364) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3597 |
[[34m2026-01-30 03:51:13[39m] (step=0003365) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3598 |
-
|
| 3599 |
-
|
| 3600 |
-
[
|
| 3601 |
-
|
| 3602 |
-
|
| 3603 |
-
|
| 3604 |
-
|
| 3605 |
-
|
| 3606 |
-
|
| 3607 |
-
[
|
| 3608 |
-
|
| 3609 |
-
|
| 3610 |
-
|
| 3611 |
-
|
| 3612 |
-
|
| 3613 |
-
|
| 3614 |
-
[
|
| 3615 |
-
|
| 3616 |
-
|
| 3617 |
-
|
| 3618 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3619 |
[[34m2026-01-30 03:55:32[39m] (step=0003413) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 3620 |
[[34m2026-01-30 03:55:37[39m] (step=0003414) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3621 |
[[34m2026-01-30 03:55:42[39m] (step=0003415) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
@@ -4986,13 +5026,6 @@ ce_avg: 0.0, mse_avg: 0.0062998272478580475
|
|
| 4986 |
[[34m2026-01-30 06:00:41[39m] (step=0004780) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 4987 |
[[34m2026-01-30 06:00:47[39m] (step=0004781) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 4988 |
[[34m2026-01-30 06:00:53[39m] (step=0004782) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 4989 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step5000
|
| 4990 |
-
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 4991 |
-
[eval debug] first 3 batch fingerprints:
|
| 4992 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 4993 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 4994 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 4995 |
-
ce_avg: 0.0, mse_avg: 0.005609693005681038
|
| 4996 |
[[34m2026-01-30 06:01:00[39m] (step=0004783) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 4997 |
[[34m2026-01-30 06:01:04[39m] (step=0004784) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 4998 |
[[34m2026-01-30 06:01:09[39m] (step=0004785) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
|
@@ -5019,6 +5052,13 @@ ce_avg: 0.0, mse_avg: 0.005609693005681038
|
|
| 5019 |
[[34m2026-01-30 06:03:00[39m] (step=0004806) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5020 |
[[34m2026-01-30 06:03:05[39m] (step=0004807) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5021 |
[[34m2026-01-30 06:03:10[39m] (step=0004808) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5022 |
[[34m2026-01-30 06:03:15[39m] (step=0004809) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5023 |
[[34m2026-01-30 06:03:21[39m] (step=0004810) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 5024 |
[[34m2026-01-30 06:03:27[39m] (step=0004811) Train Loss mse: 0.0036, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
|
|
|
| 928 |
[[34m2026-01-30 00:05:37[39m] (step=0000917) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 929 |
[[34m2026-01-30 00:05:41[39m] (step=0000918) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 930 |
[[34m2026-01-30 00:05:46[39m] (step=0000919) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 931 |
FullyShardedDataParallel(
|
| 932 |
(_fsdp_wrapped_module): Bagel(
|
| 933 |
(language_model): Qwen2ForCausalLM(
|
|
|
|
| 1114 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1115 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1116 |
ce_avg: 0.0, mse_avg: 0.0055910381488502026
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1117 |
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step1500
|
| 1118 |
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 1119 |
[eval debug] first 3 batch fingerprints:
|
|
|
|
| 1121 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1122 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 1123 |
ce_avg: 0.0, mse_avg: 0.0055648270063102245
|
| 1124 |
+
[[34m2026-01-30 00:05:52[39m] (step=0000920) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1125 |
+
[[34m2026-01-30 00:05:59[39m] (step=0000921) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1126 |
+
[[34m2026-01-30 00:06:05[39m] (step=0000922) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1127 |
+
[[34m2026-01-30 00:06:10[39m] (step=0000923) Train Loss mse: 0.0113, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1128 |
+
[[34m2026-01-30 00:06:15[39m] (step=0000924) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1129 |
+
[[34m2026-01-30 00:06:21[39m] (step=0000925) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1130 |
+
[[34m2026-01-30 00:06:27[39m] (step=0000926) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1131 |
+
[[34m2026-01-30 00:06:32[39m] (step=0000927) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1132 |
+
[[34m2026-01-30 00:06:37[39m] (step=0000928) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1133 |
+
[[34m2026-01-30 00:06:42[39m] (step=0000929) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1134 |
+
[[34m2026-01-30 00:06:46[39m] (step=0000930) Train Loss mse: 0.0122, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1135 |
+
[[34m2026-01-30 00:06:51[39m] (step=0000931) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1136 |
+
[[34m2026-01-30 00:06:57[39m] (step=0000932) Train Loss mse: 0.0069, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1137 |
+
[[34m2026-01-30 00:07:02[39m] (step=0000933) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1138 |
+
[[34m2026-01-30 00:07:07[39m] (step=0000934) Train Loss mse: 0.0071, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1139 |
+
[[34m2026-01-30 00:07:13[39m] (step=0000935) Train Loss mse: 0.0064, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1140 |
+
[[34m2026-01-30 00:07:18[39m] (step=0000936) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1141 |
+
[[34m2026-01-30 00:07:24[39m] (step=0000937) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1142 |
+
[[34m2026-01-30 00:07:31[39m] (step=0000938) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1143 |
+
[[34m2026-01-30 00:07:35[39m] (step=0000939) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1144 |
+
[[34m2026-01-30 00:07:40[39m] (step=0000940) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1145 |
+
[[34m2026-01-30 00:07:45[39m] (step=0000941) Train Loss mse: 0.0077, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1146 |
+
[[34m2026-01-30 00:07:51[39m] (step=0000942) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1147 |
+
[[34m2026-01-30 00:07:56[39m] (step=0000943) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1148 |
+
[[34m2026-01-30 00:08:02[39m] (step=0000944) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 1149 |
+
[[34m2026-01-30 00:08:08[39m] (step=0000945) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1150 |
+
[[34m2026-01-30 00:08:13[39m] (step=0000946) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1151 |
+
[[34m2026-01-30 00:08:19[39m] (step=0000947) Train Loss mse: 0.0058, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1152 |
+
[[34m2026-01-30 00:08:25[39m] (step=0000948) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 1153 |
+
[[34m2026-01-30 00:08:30[39m] (step=0000949) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1154 |
+
[[34m2026-01-30 00:08:35[39m] (step=0000950) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1155 |
+
[[34m2026-01-30 00:08:41[39m] (step=0000951) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1156 |
+
[[34m2026-01-30 00:08:46[39m] (step=0000952) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1157 |
+
[[34m2026-01-30 00:08:52[39m] (step=0000953) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1158 |
+
[[34m2026-01-30 00:08:58[39m] (step=0000954) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1159 |
+
[[34m2026-01-30 00:09:04[39m] (step=0000955) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 1160 |
+
[[34m2026-01-30 00:09:09[39m] (step=0000956) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1161 |
+
[[34m2026-01-30 00:09:13[39m] (step=0000957) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1162 |
+
[[34m2026-01-30 00:09:20[39m] (step=0000958) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1163 |
+
[[34m2026-01-30 00:09:26[39m] (step=0000959) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1164 |
+
[[34m2026-01-30 00:09:31[39m] (step=0000960) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1165 |
+
[[34m2026-01-30 00:09:36[39m] (step=0000961) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1166 |
+
[[34m2026-01-30 00:09:42[39m] (step=0000962) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1167 |
+
[[34m2026-01-30 00:09:48[39m] (step=0000963) Train Loss mse: 0.0068, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1168 |
+
[[34m2026-01-30 00:09:53[39m] (step=0000964) Train Loss mse: 0.0070, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1169 |
+
[[34m2026-01-30 00:09:58[39m] (step=0000965) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1170 |
+
[[34m2026-01-30 00:10:04[39m] (step=0000966) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1171 |
+
[[34m2026-01-30 00:10:09[39m] (step=0000967) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1172 |
+
[[34m2026-01-30 00:10:14[39m] (step=0000968) Train Loss mse: 0.0074, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1173 |
+
[[34m2026-01-30 00:10:19[39m] (step=0000969) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1174 |
+
[[34m2026-01-30 00:10:25[39m] (step=0000970) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1175 |
+
[[34m2026-01-30 00:10:30[39m] (step=0000971) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1176 |
+
[[34m2026-01-30 00:10:35[39m] (step=0000972) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1177 |
+
[[34m2026-01-30 00:10:41[39m] (step=0000973) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1178 |
+
[[34m2026-01-30 00:10:48[39m] (step=0000974) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1179 |
+
[[34m2026-01-30 00:10:54[39m] (step=0000975) Train Loss mse: 0.0084, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1180 |
+
[[34m2026-01-30 00:11:00[39m] (step=0000976) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1181 |
+
[[34m2026-01-30 00:11:05[39m] (step=0000977) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1182 |
+
[[34m2026-01-30 00:11:09[39m] (step=0000978) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1183 |
+
[[34m2026-01-30 00:11:14[39m] (step=0000979) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1184 |
+
[[34m2026-01-30 00:11:20[39m] (step=0000980) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1185 |
+
[[34m2026-01-30 00:11:26[39m] (step=0000981) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1186 |
+
[[34m2026-01-30 00:11:32[39m] (step=0000982) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1187 |
+
[[34m2026-01-30 00:11:38[39m] (step=0000983) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1188 |
+
[[34m2026-01-30 00:11:43[39m] (step=0000984) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1189 |
+
[[34m2026-01-30 00:11:47[39m] (step=0000985) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1190 |
+
[[34m2026-01-30 00:11:52[39m] (step=0000986) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1191 |
+
[[34m2026-01-30 00:11:58[39m] (step=0000987) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1192 |
+
[[34m2026-01-30 00:12:04[39m] (step=0000988) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1193 |
+
[[34m2026-01-30 00:12:09[39m] (step=0000989) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1194 |
+
[[34m2026-01-30 00:12:14[39m] (step=0000990) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1195 |
+
[[34m2026-01-30 00:12:18[39m] (step=0000991) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1196 |
+
[[34m2026-01-30 00:12:25[39m] (step=0000992) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1197 |
+
[[34m2026-01-30 00:12:31[39m] (step=0000993) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1198 |
+
[[34m2026-01-30 00:12:36[39m] (step=0000994) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1199 |
+
[[34m2026-01-30 00:12:41[39m] (step=0000995) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1200 |
+
[[34m2026-01-30 00:12:46[39m] (step=0000996) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1201 |
+
[[34m2026-01-30 00:12:52[39m] (step=0000997) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1202 |
+
[[34m2026-01-30 00:12:57[39m] (step=0000998) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1203 |
+
[[34m2026-01-30 00:13:02[39m] (step=0000999) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1204 |
+
[[34m2026-01-30 00:13:33[39m] (step=0001000) Train Loss mse: 0.0065, Train Loss ce: 0.0000, Train Steps/Sec: 0.03,
|
| 1205 |
+
[[34m2026-01-30 00:13:37[39m] (step=0001001) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1206 |
+
[[34m2026-01-30 00:13:43[39m] (step=0001002) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1207 |
+
[[34m2026-01-30 00:13:50[39m] (step=0001003) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1208 |
+
[[34m2026-01-30 00:13:56[39m] (step=0001004) Train Loss mse: 0.0052, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1209 |
+
[[34m2026-01-30 00:14:02[39m] (step=0001005) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1210 |
+
[[34m2026-01-30 00:14:08[39m] (step=0001006) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1211 |
+
[[34m2026-01-30 00:14:13[39m] (step=0001007) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1212 |
+
[[34m2026-01-30 00:14:20[39m] (step=0001008) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1213 |
+
[[34m2026-01-30 00:14:24[39m] (step=0001009) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1214 |
+
[[34m2026-01-30 00:14:29[39m] (step=0001010) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1215 |
+
[[34m2026-01-30 00:14:33[39m] (step=0001011) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1216 |
+
[[34m2026-01-30 00:14:37[39m] (step=0001012) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.25,
|
| 1217 |
+
[[34m2026-01-30 00:14:44[39m] (step=0001013) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1218 |
+
[[34m2026-01-30 00:14:50[39m] (step=0001014) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1219 |
+
[[34m2026-01-30 00:14:55[39m] (step=0001015) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1220 |
+
[[34m2026-01-30 00:15:01[39m] (step=0001016) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1221 |
+
[[34m2026-01-30 00:15:07[39m] (step=0001017) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1222 |
+
[[34m2026-01-30 00:15:12[39m] (step=0001018) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1223 |
+
[[34m2026-01-30 00:15:18[39m] (step=0001019) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1224 |
+
[[34m2026-01-30 00:15:23[39m] (step=0001020) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1225 |
+
[[34m2026-01-30 00:15:29[39m] (step=0001021) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1226 |
+
[[34m2026-01-30 00:15:35[39m] (step=0001022) Train Loss mse: 0.0056, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1227 |
+
[[34m2026-01-30 00:15:42[39m] (step=0001023) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1228 |
+
[[34m2026-01-30 00:15:48[39m] (step=0001024) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1229 |
+
[[34m2026-01-30 00:15:53[39m] (step=0001025) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1230 |
+
[[34m2026-01-30 00:15:58[39m] (step=0001026) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 1231 |
+
[[34m2026-01-30 00:16:03[39m] (step=0001027) Train Loss mse: 0.0079, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1232 |
+
[[34m2026-01-30 00:16:09[39m] (step=0001028) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 1233 |
+
[[34m2026-01-30 00:16:16[39m] (step=0001029) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1234 |
+
[[34m2026-01-30 00:16:21[39m] (step=0001030) Train Loss mse: 0.0066, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 1235 |
+
[[34m2026-01-30 00:16:26[39m] (step=0001031) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1236 |
+
[[34m2026-01-30 00:16:32[39m] (step=0001032) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 1237 |
+
[[34m2026-01-30 00:16:36[39m] (step=0001033) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1238 |
+
[[34m2026-01-30 00:16:43[39m] (step=0001034) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1239 |
+
[[34m2026-01-30 00:16:48[39m] (step=0001035) Train Loss mse: 0.0061, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 1240 |
+
[[34m2026-01-30 00:16:52[39m] (step=0001036) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.23,
|
| 1241 |
+
[[34m2026-01-30 00:16:57[39m] (step=0001037) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 1242 |
+
[[34m2026-01-30 00:17:03[39m] (step=0001038) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1243 |
+
[[34m2026-01-30 00:17:09[39m] (step=0001039) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 1244 |
[[34m2026-01-30 00:17:14[39m] (step=0001040) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1245 |
[[34m2026-01-30 00:17:19[39m] (step=0001041) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 1246 |
[[34m2026-01-30 00:17:24[39m] (step=0001042) Train Loss mse: 0.0063, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
|
|
| 2504 |
[[34m2026-01-30 02:13:09[39m] (step=0002300) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 2505 |
[[34m2026-01-30 02:13:14[39m] (step=0002301) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 2506 |
[[34m2026-01-30 02:13:20[39m] (step=0002302) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 2507 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2000
|
| 2508 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 2509 |
+
[eval debug] first 3 batch fingerprints:
|
| 2510 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2511 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2512 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2513 |
+
ce_avg: 0.0, mse_avg: 0.0056802802719175816
|
| 2514 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step2500
|
| 2515 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 2516 |
+
[eval debug] first 3 batch fingerprints:
|
| 2517 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2518 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2519 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2520 |
+
ce_avg: 0.0, mse_avg: 0.005605650134384632
|
| 2521 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3000
|
| 2522 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 2523 |
+
[eval debug] first 3 batch fingerprints:
|
| 2524 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2525 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2526 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 2527 |
+
ce_avg: 0.0, mse_avg: 0.005620267707854509
|
| 2528 |
[[34m2026-01-30 02:13:25[39m] (step=0002303) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 2529 |
[[34m2026-01-30 02:13:30[39m] (step=0002304) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 2530 |
[[34m2026-01-30 02:13:36[39m] (step=0002305) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
|
|
|
| 2628 |
[[34m2026-01-30 02:22:40[39m] (step=0002403) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 2629 |
[[34m2026-01-30 02:22:45[39m] (step=0002404) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 2630 |
[[34m2026-01-30 02:22:50[39m] (step=0002405) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2631 |
[[34m2026-01-30 02:22:56[39m] (step=0002406) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 2632 |
[[34m2026-01-30 02:23:01[39m] (step=0002407) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 2633 |
[[34m2026-01-30 02:23:06[39m] (step=0002408) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
|
|
|
| 3542 |
[[34m2026-01-30 03:46:48[39m] (step=0003317) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3543 |
[[34m2026-01-30 03:46:53[39m] (step=0003318) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3544 |
[[34m2026-01-30 03:46:59[39m] (step=0003319) Train Loss mse: 0.0049, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3545 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step3500
|
| 3546 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 3547 |
+
[eval debug] first 3 batch fingerprints:
|
| 3548 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3549 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3550 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3551 |
+
ce_avg: 0.0, mse_avg: 0.005567264277487993
|
| 3552 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step4000
|
| 3553 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 3554 |
+
[eval debug] first 3 batch fingerprints:
|
| 3555 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3556 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3557 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3558 |
+
ce_avg: 0.0, mse_avg: 0.005860968492925167
|
| 3559 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step4500
|
| 3560 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 3561 |
+
[eval debug] first 3 batch fingerprints:
|
| 3562 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3563 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3564 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 3565 |
+
ce_avg: 0.0, mse_avg: 0.0062998272478580475
|
| 3566 |
[[34m2026-01-30 03:47:05[39m] (step=0003320) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3567 |
[[34m2026-01-30 03:47:11[39m] (step=0003321) Train Loss mse: 0.0060, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3568 |
[[34m2026-01-30 03:47:16[39m] (step=0003322) Train Loss mse: 0.0057, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
|
|
| 3609 |
[[34m2026-01-30 03:51:02[39m] (step=0003363) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3610 |
[[34m2026-01-30 03:51:07[39m] (step=0003364) Train Loss mse: 0.0062, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3611 |
[[34m2026-01-30 03:51:13[39m] (step=0003365) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3612 |
+
[[34m2026-01-30 03:51:17[39m] (step=0003366) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 3613 |
+
[[34m2026-01-30 03:51:22[39m] (step=0003367) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 3614 |
+
[[34m2026-01-30 03:51:28[39m] (step=0003368) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3615 |
+
[[34m2026-01-30 03:51:35[39m] (step=0003369) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 3616 |
+
[[34m2026-01-30 03:51:40[39m] (step=0003370) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3617 |
+
[[34m2026-01-30 03:51:45[39m] (step=0003371) Train Loss mse: 0.0038, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3618 |
+
[[34m2026-01-30 03:51:50[39m] (step=0003372) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3619 |
+
[[34m2026-01-30 03:51:55[39m] (step=0003373) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3620 |
+
[[34m2026-01-30 03:52:00[39m] (step=0003374) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3621 |
+
[[34m2026-01-30 03:52:05[39m] (step=0003375) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3622 |
+
[[34m2026-01-30 03:52:10[39m] (step=0003376) Train Loss mse: 0.0033, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 3623 |
+
[[34m2026-01-30 03:52:15[39m] (step=0003377) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 3624 |
+
[[34m2026-01-30 03:52:20[39m] (step=0003378) Train Loss mse: 0.0035, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3625 |
+
[[34m2026-01-30 03:52:26[39m] (step=0003379) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 3626 |
+
[[34m2026-01-30 03:52:32[39m] (step=0003380) Train Loss mse: 0.0055, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3627 |
+
[[34m2026-01-30 03:52:38[39m] (step=0003381) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 3628 |
+
[[34m2026-01-30 03:52:43[39m] (step=0003382) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 3629 |
+
[[34m2026-01-30 03:52:48[39m] (step=0003383) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3630 |
+
[[34m2026-01-30 03:52:54[39m] (step=0003384) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3631 |
+
[[34m2026-01-30 03:52:59[39m] (step=0003385) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3632 |
+
[[34m2026-01-30 03:53:04[39m] (step=0003386) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 3633 |
+
[[34m2026-01-30 03:53:11[39m] (step=0003387) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3634 |
+
[[34m2026-01-30 03:53:17[39m] (step=0003388) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3635 |
+
[[34m2026-01-30 03:53:23[39m] (step=0003389) Train Loss mse: 0.0045, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3636 |
+
[[34m2026-01-30 03:53:29[39m] (step=0003390) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3637 |
+
[[34m2026-01-30 03:53:34[39m] (step=0003391) Train Loss mse: 0.0038, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3638 |
+
[[34m2026-01-30 03:53:40[39m] (step=0003392) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3639 |
+
[[34m2026-01-30 03:53:45[39m] (step=0003393) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3640 |
+
[[34m2026-01-30 03:53:50[39m] (step=0003394) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3641 |
+
[[34m2026-01-30 03:53:55[39m] (step=0003395) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3642 |
+
[[34m2026-01-30 03:54:00[39m] (step=0003396) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3643 |
+
[[34m2026-01-30 03:54:05[39m] (step=0003397) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3644 |
+
[[34m2026-01-30 03:54:12[39m] (step=0003398) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.15,
|
| 3645 |
+
[[34m2026-01-30 03:54:18[39m] (step=0003399) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3646 |
+
[[34m2026-01-30 03:54:23[39m] (step=0003400) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3647 |
+
[[34m2026-01-30 03:54:28[39m] (step=0003401) Train Loss mse: 0.0067, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3648 |
+
[[34m2026-01-30 03:54:32[39m] (step=0003402) Train Loss mse: 0.0051, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3649 |
+
[[34m2026-01-30 03:54:37[39m] (step=0003403) Train Loss mse: 0.0043, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3650 |
+
[[34m2026-01-30 03:54:43[39m] (step=0003404) Train Loss mse: 0.0034, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|
| 3651 |
+
[[34m2026-01-30 03:54:48[39m] (step=0003405) Train Loss mse: 0.0046, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3652 |
+
[[34m2026-01-30 03:54:54[39m] (step=0003406) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3653 |
+
[[34m2026-01-30 03:54:59[39m] (step=0003407) Train Loss mse: 0.0044, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 3654 |
+
[[34m2026-01-30 03:55:04[39m] (step=0003408) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.19,
|
| 3655 |
+
[[34m2026-01-30 03:55:10[39m] (step=0003409) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3656 |
+
[[34m2026-01-30 03:55:16[39m] (step=0003410) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3657 |
+
[[34m2026-01-30 03:55:23[39m] (step=0003411) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 3658 |
+
[[34m2026-01-30 03:55:28[39m] (step=0003412) Train Loss mse: 0.0054, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 3659 |
[[34m2026-01-30 03:55:32[39m] (step=0003413) Train Loss mse: 0.0039, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 3660 |
[[34m2026-01-30 03:55:37[39m] (step=0003414) Train Loss mse: 0.0050, Train Loss ce: 0.0000, Train Steps/Sec: 0.22,
|
| 3661 |
[[34m2026-01-30 03:55:42[39m] (step=0003415) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
|
|
|
| 5026 |
[[34m2026-01-30 06:00:41[39m] (step=0004780) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 5027 |
[[34m2026-01-30 06:00:47[39m] (step=0004781) Train Loss mse: 0.0059, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 5028 |
[[34m2026-01-30 06:00:53[39m] (step=0004782) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5029 |
[[34m2026-01-30 06:01:00[39m] (step=0004783) Train Loss mse: 0.0048, Train Loss ce: 0.0000, Train Steps/Sec: 0.16,
|
| 5030 |
[[34m2026-01-30 06:01:04[39m] (step=0004784) Train Loss mse: 0.0040, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
| 5031 |
[[34m2026-01-30 06:01:09[39m] (step=0004785) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.21,
|
|
|
|
| 5052 |
[[34m2026-01-30 06:03:00[39m] (step=0004806) Train Loss mse: 0.0042, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5053 |
[[34m2026-01-30 06:03:05[39m] (step=0004807) Train Loss mse: 0.0041, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5054 |
[[34m2026-01-30 06:03:10[39m] (step=0004808) Train Loss mse: 0.0053, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5055 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_reference_dot_one_image_lr2e_5_mse_only_ins/eval_used_rows, step_tag is vlm_gym_reference_dot_one_img_lr2e_5_mse_only_ins_step5000
|
| 5056 |
+
Preparing Dataset vlm_gym_reference_dot_mse_loss_only_evalonce/vlm_gym_reference_dot_val
|
| 5057 |
+
[eval debug] first 3 batch fingerprints:
|
| 5058 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 5059 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 5060 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_reference_dot_mse_loss_only_evalonce'}]
|
| 5061 |
+
ce_avg: 0.0, mse_avg: 0.005609693005681038
|
| 5062 |
[[34m2026-01-30 06:03:15[39m] (step=0004809) Train Loss mse: 0.0037, Train Loss ce: 0.0000, Train Steps/Sec: 0.20,
|
| 5063 |
[[34m2026-01-30 06:03:21[39m] (step=0004810) Train Loss mse: 0.0047, Train Loss ce: 0.0000, Train Steps/Sec: 0.18,
|
| 5064 |
[[34m2026-01-30 06:03:27[39m] (step=0004811) Train Loss mse: 0.0036, Train Loss ce: 0.0000, Train Steps/Sec: 0.17,
|