Upload checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins

Browse files

Files changed (1) hide show

checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/wandb/offline-run-20260127_054730-checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins-run0/files/output.log +57 -57

checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/wandb/offline-run-20260127_054730-checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins-run0/files/output.log CHANGED Viewed

@@ -1240,6 +1240,27 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
 [[34m2026-01-27 06:18:24[39m] (step=0001052) Train Loss mse: 0.0000, Train Loss ce: 0.2349, Train Steps/Sec: 0.62,
 [[34m2026-01-27 06:18:25[39m] (step=0001053) Train Loss mse: 0.0000, Train Loss ce: 0.2006, Train Steps/Sec: 0.75,
 [[34m2026-01-27 06:18:27[39m] (step=0001054) Train Loss mse: 0.0000, Train Loss ce: 0.2222, Train Steps/Sec: 0.77,
 [[34m2026-01-27 06:18:28[39m] (step=0001055) Train Loss mse: 0.0000, Train Loss ce: 0.2943, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:18:29[39m] (step=0001056) Train Loss mse: 0.0000, Train Loss ce: 0.2244, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:18:31[39m] (step=0001057) Train Loss mse: 0.0000, Train Loss ce: 0.1987, Train Steps/Sec: 0.67,
@@ -1318,20 +1339,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
 [[34m2026-01-27 06:20:13[39m] (step=0001130) Train Loss mse: 0.0000, Train Loss ce: 0.2712, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:20:14[39m] (step=0001131) Train Loss mse: 0.0000, Train Loss ce: 0.2670, Train Steps/Sec: 0.74,
 [[34m2026-01-27 06:20:16[39m] (step=0001132) Train Loss mse: 0.0000, Train Loss ce: 0.2389, Train Steps/Sec: 0.63,
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step1500
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.482469767332077, mse_avg: 0.0
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step2000
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.5253902673721313, mse_avg: 0.0
 [[34m2026-01-27 06:20:17[39m] (step=0001133) Train Loss mse: 0.0000, Train Loss ce: 0.2480, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:20:18[39m] (step=0001134) Train Loss mse: 0.0000, Train Loss ce: 0.2425, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:20:20[39m] (step=0001135) Train Loss mse: 0.0000, Train Loss ce: 0.2217, Train Steps/Sec: 0.77,
@@ -2754,6 +2761,20 @@ ce_avg: 0.5253902673721313, mse_avg: 0.0
 [[34m2026-01-27 06:53:27[39m] (step=0002552) Train Loss mse: 0.0000, Train Loss ce: 0.2395, Train Steps/Sec: 0.65,
 [[34m2026-01-27 06:53:29[39m] (step=0002553) Train Loss mse: 0.0000, Train Loss ce: 0.2514, Train Steps/Sec: 0.77,
 [[34m2026-01-27 06:53:30[39m] (step=0002554) Train Loss mse: 0.0000, Train Loss ce: 0.2746, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:53:31[39m] (step=0002555) Train Loss mse: 0.0000, Train Loss ce: 0.2420, Train Steps/Sec: 0.77,
 [[34m2026-01-27 06:53:33[39m] (step=0002556) Train Loss mse: 0.0000, Train Loss ce: 0.2485, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:53:34[39m] (step=0002557) Train Loss mse: 0.0000, Train Loss ce: 0.2549, Train Steps/Sec: 0.74,
@@ -2843,27 +2864,6 @@ ce_avg: 0.5253902673721313, mse_avg: 0.0
 [[34m2026-01-27 06:55:28[39m] (step=0002641) Train Loss mse: 0.0000, Train Loss ce: 0.2655, Train Steps/Sec: 0.61,
 [[34m2026-01-27 06:55:30[39m] (step=0002642) Train Loss mse: 0.0000, Train Loss ce: 0.2336, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:55:31[39m] (step=0002643) Train Loss mse: 0.0000, Train Loss ce: 0.2325, Train Steps/Sec: 0.76,
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step2500
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.563564658164978, mse_avg: 0.0
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step3000
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.565230667591095, mse_avg: 0.0
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step3500
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.5503921508789062, mse_avg: 0.0
 [[34m2026-01-27 06:55:32[39m] (step=0002644) Train Loss mse: 0.0000, Train Loss ce: 0.2420, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:55:33[39m] (step=0002645) Train Loss mse: 0.0000, Train Loss ce: 0.2868, Train Steps/Sec: 0.84,
 [[34m2026-01-27 06:55:35[39m] (step=0002646) Train Loss mse: 0.0000, Train Loss ce: 0.2387, Train Steps/Sec: 0.66,
@@ -3827,6 +3827,20 @@ ce_avg: 0.5503921508789062, mse_avg: 0.0
 [[34m2026-01-27 07:17:46[39m] (step=0003604) Train Loss mse: 0.0000, Train Loss ce: 0.2357, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:17:48[39m] (step=0003605) Train Loss mse: 0.0000, Train Loss ce: 0.2432, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:17:49[39m] (step=0003606) Train Loss mse: 0.0000, Train Loss ce: 0.2302, Train Steps/Sec: 0.61,
 [[34m2026-01-27 07:17:51[39m] (step=0003607) Train Loss mse: 0.0000, Train Loss ce: 0.2441, Train Steps/Sec: 0.90,
 [[34m2026-01-27 07:17:52[39m] (step=0003608) Train Loss mse: 0.0000, Train Loss ce: 0.2167, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:17:53[39m] (step=0003609) Train Loss mse: 0.0000, Train Loss ce: 0.2331, Train Steps/Sec: 0.65,
@@ -3941,27 +3955,6 @@ ce_avg: 0.5503921508789062, mse_avg: 0.0
 [[34m2026-01-27 07:20:23[39m] (step=0003718) Train Loss mse: 0.0000, Train Loss ce: 0.2512, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:20:24[39m] (step=0003719) Train Loss mse: 0.0000, Train Loss ce: 0.2666, Train Steps/Sec: 0.66,
 [[34m2026-01-27 07:20:26[39m] (step=0003720) Train Loss mse: 0.0000, Train Loss ce: 0.2608, Train Steps/Sec: 0.77,
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step4000
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.5391465425491333, mse_avg: 0.0
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step4500
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.5354103446006775, mse_avg: 0.0
-base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step5000
-Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
-[eval debug] first 3 batch fingerprints:
-  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
-ce_avg: 0.5331812500953674, mse_avg: 0.0
 [[34m2026-01-27 07:20:27[39m] (step=0003721) Train Loss mse: 0.0000, Train Loss ce: 0.2387, Train Steps/Sec: 0.62,
 [[34m2026-01-27 07:20:28[39m] (step=0003722) Train Loss mse: 0.0000, Train Loss ce: 0.2330, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:20:30[39m] (step=0003723) Train Loss mse: 0.0000, Train Loss ce: 0.1913, Train Steps/Sec: 0.80,
@@ -5245,4 +5238,11 @@ ce_avg: 0.5331812500953674, mse_avg: 0.0
 [[34m2026-01-27 07:50:40[39m] Saving checkpoint to /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/0005000.
 /opt/conda/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
   warnings.warn(
-[[34m2026-01-27 07:53:17[39m] Done!

 [[34m2026-01-27 06:18:24[39m] (step=0001052) Train Loss mse: 0.0000, Train Loss ce: 0.2349, Train Steps/Sec: 0.62,
 [[34m2026-01-27 06:18:25[39m] (step=0001053) Train Loss mse: 0.0000, Train Loss ce: 0.2006, Train Steps/Sec: 0.75,
 [[34m2026-01-27 06:18:27[39m] (step=0001054) Train Loss mse: 0.0000, Train Loss ce: 0.2222, Train Steps/Sec: 0.77,
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step1500
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.482469767332077, mse_avg: 0.0
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step2000
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.5253902673721313, mse_avg: 0.0
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step2500
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.563564658164978, mse_avg: 0.0
 [[34m2026-01-27 06:18:28[39m] (step=0001055) Train Loss mse: 0.0000, Train Loss ce: 0.2943, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:18:29[39m] (step=0001056) Train Loss mse: 0.0000, Train Loss ce: 0.2244, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:18:31[39m] (step=0001057) Train Loss mse: 0.0000, Train Loss ce: 0.1987, Train Steps/Sec: 0.67,
 [[34m2026-01-27 06:20:13[39m] (step=0001130) Train Loss mse: 0.0000, Train Loss ce: 0.2712, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:20:14[39m] (step=0001131) Train Loss mse: 0.0000, Train Loss ce: 0.2670, Train Steps/Sec: 0.74,
 [[34m2026-01-27 06:20:16[39m] (step=0001132) Train Loss mse: 0.0000, Train Loss ce: 0.2389, Train Steps/Sec: 0.63,
 [[34m2026-01-27 06:20:17[39m] (step=0001133) Train Loss mse: 0.0000, Train Loss ce: 0.2480, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:20:18[39m] (step=0001134) Train Loss mse: 0.0000, Train Loss ce: 0.2425, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:20:20[39m] (step=0001135) Train Loss mse: 0.0000, Train Loss ce: 0.2217, Train Steps/Sec: 0.77,
 [[34m2026-01-27 06:53:27[39m] (step=0002552) Train Loss mse: 0.0000, Train Loss ce: 0.2395, Train Steps/Sec: 0.65,
 [[34m2026-01-27 06:53:29[39m] (step=0002553) Train Loss mse: 0.0000, Train Loss ce: 0.2514, Train Steps/Sec: 0.77,
 [[34m2026-01-27 06:53:30[39m] (step=0002554) Train Loss mse: 0.0000, Train Loss ce: 0.2746, Train Steps/Sec: 0.76,
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step3000
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.565230667591095, mse_avg: 0.0
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step3500
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.5503921508789062, mse_avg: 0.0
 [[34m2026-01-27 06:53:31[39m] (step=0002555) Train Loss mse: 0.0000, Train Loss ce: 0.2420, Train Steps/Sec: 0.77,
 [[34m2026-01-27 06:53:33[39m] (step=0002556) Train Loss mse: 0.0000, Train Loss ce: 0.2485, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:53:34[39m] (step=0002557) Train Loss mse: 0.0000, Train Loss ce: 0.2549, Train Steps/Sec: 0.74,
 [[34m2026-01-27 06:55:28[39m] (step=0002641) Train Loss mse: 0.0000, Train Loss ce: 0.2655, Train Steps/Sec: 0.61,
 [[34m2026-01-27 06:55:30[39m] (step=0002642) Train Loss mse: 0.0000, Train Loss ce: 0.2336, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:55:31[39m] (step=0002643) Train Loss mse: 0.0000, Train Loss ce: 0.2325, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:55:32[39m] (step=0002644) Train Loss mse: 0.0000, Train Loss ce: 0.2420, Train Steps/Sec: 0.76,
 [[34m2026-01-27 06:55:33[39m] (step=0002645) Train Loss mse: 0.0000, Train Loss ce: 0.2868, Train Steps/Sec: 0.84,
 [[34m2026-01-27 06:55:35[39m] (step=0002646) Train Loss mse: 0.0000, Train Loss ce: 0.2387, Train Steps/Sec: 0.66,
 [[34m2026-01-27 07:17:46[39m] (step=0003604) Train Loss mse: 0.0000, Train Loss ce: 0.2357, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:17:48[39m] (step=0003605) Train Loss mse: 0.0000, Train Loss ce: 0.2432, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:17:49[39m] (step=0003606) Train Loss mse: 0.0000, Train Loss ce: 0.2302, Train Steps/Sec: 0.61,
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step4000
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.5391465425491333, mse_avg: 0.0
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step4500
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.5354103446006775, mse_avg: 0.0
 [[34m2026-01-27 07:17:51[39m] (step=0003607) Train Loss mse: 0.0000, Train Loss ce: 0.2441, Train Steps/Sec: 0.90,
 [[34m2026-01-27 07:17:52[39m] (step=0003608) Train Loss mse: 0.0000, Train Loss ce: 0.2167, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:17:53[39m] (step=0003609) Train Loss mse: 0.0000, Train Loss ce: 0.2331, Train Steps/Sec: 0.65,
 [[34m2026-01-27 07:20:23[39m] (step=0003718) Train Loss mse: 0.0000, Train Loss ce: 0.2512, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:20:24[39m] (step=0003719) Train Loss mse: 0.0000, Train Loss ce: 0.2666, Train Steps/Sec: 0.66,
 [[34m2026-01-27 07:20:26[39m] (step=0003720) Train Loss mse: 0.0000, Train Loss ce: 0.2608, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:20:27[39m] (step=0003721) Train Loss mse: 0.0000, Train Loss ce: 0.2387, Train Steps/Sec: 0.62,
 [[34m2026-01-27 07:20:28[39m] (step=0003722) Train Loss mse: 0.0000, Train Loss ce: 0.2330, Train Steps/Sec: 0.77,
 [[34m2026-01-27 07:20:30[39m] (step=0003723) Train Loss mse: 0.0000, Train Loss ce: 0.1913, Train Steps/Sec: 0.80,
 [[34m2026-01-27 07:50:40[39m] Saving checkpoint to /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/0005000.
 /opt/conda/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
   warnings.warn(
+[[34m2026-01-27 07:53:17[39m] Done!
+base_dir is /dev/shm/models/checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins/eval_used_rows, step_tag is checkpoints_vlm_gym_mental_rotation_2d_one_image_lr2e_5_ce_no_mse_ins_step5000
+Preparing Dataset vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce/vlm_gym_mental_rotation_2d_val
+[eval debug] first 3 batch fingerprints:
+  fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+  fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_mental_rotation_2d_celoss_no_mse_evalonce'}]
+ce_avg: 0.5331812500953674, mse_avg: 0.0