Upload checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce
Browse files- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20251229_044705-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce-run0/files/output.log +175 -175
- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20251230_023052-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993-run0/files/output.log +94 -94
- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20251230_024051-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log +18 -18
- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260104_093254-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log +181 -181
- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260105_090343-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log +21 -21
- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260107_093905-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log +200 -200
- checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260107_184933-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log +107 -107
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20251229_044705-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce-run0/files/output.log
CHANGED
|
@@ -1,3 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
wandb: Detected [huggingface_hub.inference] in use.
|
| 2 |
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
|
| 3 |
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
@@ -936,181 +1111,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 936 |
[[34m2025-12-29 08:26:29[39m] (step=0000925) Train Loss mse: 0.0482, Train Loss ce: 0.0633, Train Steps/Sec: 0.06,
|
| 937 |
[[34m2025-12-29 08:26:42[39m] (step=0000926) Train Loss mse: 0.0486, Train Loss ce: 0.0633, Train Steps/Sec: 0.08,
|
| 938 |
[[34m2025-12-29 08:26:54[39m] (step=0000927) Train Loss mse: 0.0519, Train Loss ce: 0.0643, Train Steps/Sec: 0.08,
|
| 939 |
-
FullyShardedDataParallel(
|
| 940 |
-
(_fsdp_wrapped_module): Bagel(
|
| 941 |
-
(language_model): Qwen2ForCausalLM(
|
| 942 |
-
(model): Qwen2Model(
|
| 943 |
-
(embed_tokens): Embedding(152064, 3584)
|
| 944 |
-
(layers): ModuleList(
|
| 945 |
-
(0-27): 28 x FullyShardedDataParallel(
|
| 946 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 947 |
-
(_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
|
| 948 |
-
(self_attn): PackedAttentionMoT(
|
| 949 |
-
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
|
| 950 |
-
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 951 |
-
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 952 |
-
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
|
| 953 |
-
(q_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 954 |
-
(k_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 955 |
-
(q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 956 |
-
(k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 957 |
-
(q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
|
| 958 |
-
(k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 959 |
-
(v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 960 |
-
(o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
|
| 961 |
-
)
|
| 962 |
-
(mlp): Qwen2MLP(
|
| 963 |
-
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 964 |
-
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 965 |
-
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 966 |
-
(act_fn): SiLU()
|
| 967 |
-
)
|
| 968 |
-
(mlp_moe_gen): Qwen2MLP(
|
| 969 |
-
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 970 |
-
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 971 |
-
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 972 |
-
(act_fn): SiLU()
|
| 973 |
-
)
|
| 974 |
-
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 975 |
-
(input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 976 |
-
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 977 |
-
(post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 978 |
-
)
|
| 979 |
-
)
|
| 980 |
-
)
|
| 981 |
-
)
|
| 982 |
-
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 983 |
-
(norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 984 |
-
(rotary_emb): Qwen2RotaryEmbedding()
|
| 985 |
-
)
|
| 986 |
-
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
|
| 987 |
-
)
|
| 988 |
-
(time_embedder): FullyShardedDataParallel(
|
| 989 |
-
(_fsdp_wrapped_module): TimestepEmbedder(
|
| 990 |
-
(mlp): Sequential(
|
| 991 |
-
(0): Linear(in_features=256, out_features=3584, bias=True)
|
| 992 |
-
(1): SiLU()
|
| 993 |
-
(2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 994 |
-
)
|
| 995 |
-
)
|
| 996 |
-
)
|
| 997 |
-
(vae2llm): Linear(in_features=64, out_features=3584, bias=True)
|
| 998 |
-
(llm2vae): Linear(in_features=3584, out_features=64, bias=True)
|
| 999 |
-
(latent_pos_embed): FullyShardedDataParallel(
|
| 1000 |
-
(_fsdp_wrapped_module): PositionEmbedding()
|
| 1001 |
-
)
|
| 1002 |
-
(vit_model): SiglipVisionModel(
|
| 1003 |
-
(vision_model): FullyShardedDataParallel(
|
| 1004 |
-
(_fsdp_wrapped_module): SiglipVisionTransformer(
|
| 1005 |
-
(embeddings): SiglipVisionEmbeddings(
|
| 1006 |
-
(position_embedding): Embedding(4900, 1152)
|
| 1007 |
-
(patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
|
| 1008 |
-
)
|
| 1009 |
-
(encoder): SiglipEncoder(
|
| 1010 |
-
(layers): ModuleList(
|
| 1011 |
-
(0-25): 26 x FullyShardedDataParallel(
|
| 1012 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 1013 |
-
(_checkpoint_wrapped_module): SiglipEncoderLayer(
|
| 1014 |
-
(self_attn): SiglipFlashAttention2(
|
| 1015 |
-
(k_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 1016 |
-
(v_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 1017 |
-
(q_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 1018 |
-
(out_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 1019 |
-
)
|
| 1020 |
-
(layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 1021 |
-
(mlp): SiglipMLP(
|
| 1022 |
-
(activation_fn): PytorchGELUTanh()
|
| 1023 |
-
(fc1): Linear(in_features=1152, out_features=4304, bias=True)
|
| 1024 |
-
(fc2): Linear(in_features=4304, out_features=1152, bias=True)
|
| 1025 |
-
)
|
| 1026 |
-
(layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 1027 |
-
)
|
| 1028 |
-
)
|
| 1029 |
-
)
|
| 1030 |
-
)
|
| 1031 |
-
)
|
| 1032 |
-
(post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 1033 |
-
)
|
| 1034 |
-
)
|
| 1035 |
-
)
|
| 1036 |
-
(connector): FullyShardedDataParallel(
|
| 1037 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 1038 |
-
(_checkpoint_wrapped_module): MLPconnector(
|
| 1039 |
-
(activation_fn): PytorchGELUTanh()
|
| 1040 |
-
(fc1): Linear(in_features=1152, out_features=3584, bias=True)
|
| 1041 |
-
(fc2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 1042 |
-
)
|
| 1043 |
-
)
|
| 1044 |
-
)
|
| 1045 |
-
(vit_pos_embed): FullyShardedDataParallel(
|
| 1046 |
-
(_fsdp_wrapped_module): PositionEmbedding()
|
| 1047 |
-
)
|
| 1048 |
-
)
|
| 1049 |
-
)
|
| 1050 |
-
_flat_param True
|
| 1051 |
-
language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1052 |
-
language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1053 |
-
language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1054 |
-
language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1055 |
-
language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1056 |
-
language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1057 |
-
language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1058 |
-
language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1059 |
-
language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1060 |
-
language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1061 |
-
language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1062 |
-
language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1063 |
-
language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1064 |
-
language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1065 |
-
language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1066 |
-
language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1067 |
-
language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1068 |
-
language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1069 |
-
language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1070 |
-
language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1071 |
-
language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1072 |
-
language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1073 |
-
language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1074 |
-
language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1075 |
-
language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1076 |
-
language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1077 |
-
language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1078 |
-
language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1079 |
-
time_embedder._fsdp_wrapped_module._flat_param True
|
| 1080 |
-
latent_pos_embed._fsdp_wrapped_module._flat_param False
|
| 1081 |
-
vit_model.vision_model._fsdp_wrapped_module._flat_param True
|
| 1082 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1083 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1084 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1085 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1086 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1087 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1088 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1089 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1090 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1091 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1092 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1093 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1094 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1095 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1096 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1097 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1098 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1099 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1100 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1101 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1102 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1103 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1104 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1105 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1106 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1107 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1108 |
-
connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1109 |
-
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 1110 |
-
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 1111 |
-
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 1112 |
-
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_val
|
| 1113 |
-
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 1114 |
[[34m2025-12-29 08:27:07[39m] (step=0000928) Train Loss mse: 0.0324, Train Loss ce: 0.0643, Train Steps/Sec: 0.08,
|
| 1115 |
[[34m2025-12-29 08:27:20[39m] (step=0000929) Train Loss mse: 0.0449, Train Loss ce: 0.0605, Train Steps/Sec: 0.07,
|
| 1116 |
[[34m2025-12-29 08:27:34[39m] (step=0000930) Train Loss mse: 0.0395, Train Loss ce: 0.0646, Train Steps/Sec: 0.07,
|
|
|
|
| 1 |
+
FullyShardedDataParallel(
|
| 2 |
+
(_fsdp_wrapped_module): Bagel(
|
| 3 |
+
(language_model): Qwen2ForCausalLM(
|
| 4 |
+
(model): Qwen2Model(
|
| 5 |
+
(embed_tokens): Embedding(152064, 3584)
|
| 6 |
+
(layers): ModuleList(
|
| 7 |
+
(0-27): 28 x FullyShardedDataParallel(
|
| 8 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 9 |
+
(_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
|
| 10 |
+
(self_attn): PackedAttentionMoT(
|
| 11 |
+
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
|
| 12 |
+
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 13 |
+
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 14 |
+
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
|
| 15 |
+
(q_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 16 |
+
(k_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 17 |
+
(q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 18 |
+
(k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 19 |
+
(q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
|
| 20 |
+
(k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 21 |
+
(v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 22 |
+
(o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
|
| 23 |
+
)
|
| 24 |
+
(mlp): Qwen2MLP(
|
| 25 |
+
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 26 |
+
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 27 |
+
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 28 |
+
(act_fn): SiLU()
|
| 29 |
+
)
|
| 30 |
+
(mlp_moe_gen): Qwen2MLP(
|
| 31 |
+
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 32 |
+
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 33 |
+
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 34 |
+
(act_fn): SiLU()
|
| 35 |
+
)
|
| 36 |
+
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 37 |
+
(input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 38 |
+
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 39 |
+
(post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 40 |
+
)
|
| 41 |
+
)
|
| 42 |
+
)
|
| 43 |
+
)
|
| 44 |
+
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 45 |
+
(norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 46 |
+
(rotary_emb): Qwen2RotaryEmbedding()
|
| 47 |
+
)
|
| 48 |
+
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
|
| 49 |
+
)
|
| 50 |
+
(time_embedder): FullyShardedDataParallel(
|
| 51 |
+
(_fsdp_wrapped_module): TimestepEmbedder(
|
| 52 |
+
(mlp): Sequential(
|
| 53 |
+
(0): Linear(in_features=256, out_features=3584, bias=True)
|
| 54 |
+
(1): SiLU()
|
| 55 |
+
(2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 56 |
+
)
|
| 57 |
+
)
|
| 58 |
+
)
|
| 59 |
+
(vae2llm): Linear(in_features=64, out_features=3584, bias=True)
|
| 60 |
+
(llm2vae): Linear(in_features=3584, out_features=64, bias=True)
|
| 61 |
+
(latent_pos_embed): FullyShardedDataParallel(
|
| 62 |
+
(_fsdp_wrapped_module): PositionEmbedding()
|
| 63 |
+
)
|
| 64 |
+
(vit_model): SiglipVisionModel(
|
| 65 |
+
(vision_model): FullyShardedDataParallel(
|
| 66 |
+
(_fsdp_wrapped_module): SiglipVisionTransformer(
|
| 67 |
+
(embeddings): SiglipVisionEmbeddings(
|
| 68 |
+
(position_embedding): Embedding(4900, 1152)
|
| 69 |
+
(patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
|
| 70 |
+
)
|
| 71 |
+
(encoder): SiglipEncoder(
|
| 72 |
+
(layers): ModuleList(
|
| 73 |
+
(0-25): 26 x FullyShardedDataParallel(
|
| 74 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 75 |
+
(_checkpoint_wrapped_module): SiglipEncoderLayer(
|
| 76 |
+
(self_attn): SiglipFlashAttention2(
|
| 77 |
+
(k_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 78 |
+
(v_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 79 |
+
(q_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 80 |
+
(out_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 81 |
+
)
|
| 82 |
+
(layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 83 |
+
(mlp): SiglipMLP(
|
| 84 |
+
(activation_fn): PytorchGELUTanh()
|
| 85 |
+
(fc1): Linear(in_features=1152, out_features=4304, bias=True)
|
| 86 |
+
(fc2): Linear(in_features=4304, out_features=1152, bias=True)
|
| 87 |
+
)
|
| 88 |
+
(layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 89 |
+
)
|
| 90 |
+
)
|
| 91 |
+
)
|
| 92 |
+
)
|
| 93 |
+
)
|
| 94 |
+
(post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 95 |
+
)
|
| 96 |
+
)
|
| 97 |
+
)
|
| 98 |
+
(connector): FullyShardedDataParallel(
|
| 99 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 100 |
+
(_checkpoint_wrapped_module): MLPconnector(
|
| 101 |
+
(activation_fn): PytorchGELUTanh()
|
| 102 |
+
(fc1): Linear(in_features=1152, out_features=3584, bias=True)
|
| 103 |
+
(fc2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 104 |
+
)
|
| 105 |
+
)
|
| 106 |
+
)
|
| 107 |
+
(vit_pos_embed): FullyShardedDataParallel(
|
| 108 |
+
(_fsdp_wrapped_module): PositionEmbedding()
|
| 109 |
+
)
|
| 110 |
+
)
|
| 111 |
+
)
|
| 112 |
+
_flat_param True
|
| 113 |
+
language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 114 |
+
language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 115 |
+
language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 116 |
+
language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 117 |
+
language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 118 |
+
language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 119 |
+
language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 120 |
+
language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 121 |
+
language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 122 |
+
language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 123 |
+
language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 124 |
+
language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 125 |
+
language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 126 |
+
language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 127 |
+
language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 128 |
+
language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 129 |
+
language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 130 |
+
language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 131 |
+
language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 132 |
+
language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 133 |
+
language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 134 |
+
language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 135 |
+
language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 136 |
+
language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 137 |
+
language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 138 |
+
language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 139 |
+
language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 140 |
+
language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 141 |
+
time_embedder._fsdp_wrapped_module._flat_param True
|
| 142 |
+
latent_pos_embed._fsdp_wrapped_module._flat_param False
|
| 143 |
+
vit_model.vision_model._fsdp_wrapped_module._flat_param True
|
| 144 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 145 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 146 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 147 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 148 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 149 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 150 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 151 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 152 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 153 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 154 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 155 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 156 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 157 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 158 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 159 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 160 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 161 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 162 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 163 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 164 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 165 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 166 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 167 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 168 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 169 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 170 |
+
connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 171 |
+
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 172 |
+
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 173 |
+
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 174 |
+
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_val
|
| 175 |
+
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 176 |
wandb: Detected [huggingface_hub.inference] in use.
|
| 177 |
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
|
| 178 |
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
|
|
| 1111 |
[[34m2025-12-29 08:26:29[39m] (step=0000925) Train Loss mse: 0.0482, Train Loss ce: 0.0633, Train Steps/Sec: 0.06,
|
| 1112 |
[[34m2025-12-29 08:26:42[39m] (step=0000926) Train Loss mse: 0.0486, Train Loss ce: 0.0633, Train Steps/Sec: 0.08,
|
| 1113 |
[[34m2025-12-29 08:26:54[39m] (step=0000927) Train Loss mse: 0.0519, Train Loss ce: 0.0643, Train Steps/Sec: 0.08,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1114 |
[[34m2025-12-29 08:27:07[39m] (step=0000928) Train Loss mse: 0.0324, Train Loss ce: 0.0643, Train Steps/Sec: 0.08,
|
| 1115 |
[[34m2025-12-29 08:27:20[39m] (step=0000929) Train Loss mse: 0.0449, Train Loss ce: 0.0605, Train Steps/Sec: 0.07,
|
| 1116 |
[[34m2025-12-29 08:27:34[39m] (step=0000930) Train Loss mse: 0.0395, Train Loss ce: 0.0646, Train Steps/Sec: 0.07,
|
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20251230_023052-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993-run0/files/output.log
CHANGED
|
@@ -799,6 +799,100 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 799 |
[[34m2025-12-30 05:38:29[39m] (step=0000788) Train Loss mse: 0.0482, Train Loss ce: 0.0750, Train Steps/Sec: 0.09,
|
| 800 |
[[34m2025-12-30 05:38:45[39m] (step=0000789) Train Loss mse: 0.0491, Train Loss ce: 0.0644, Train Steps/Sec: 0.06,
|
| 801 |
[[34m2025-12-30 05:38:58[39m] (step=0000790) Train Loss mse: 0.0478, Train Loss ce: 0.0628, Train Steps/Sec: 0.08,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 802 |
FullyShardedDataParallel(
|
| 803 |
(_fsdp_wrapped_module): Bagel(
|
| 804 |
(language_model): Qwen2ForCausalLM(
|
|
@@ -974,100 +1068,6 @@ Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
|
| 974 |
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 975 |
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_val
|
| 976 |
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 977 |
-
[[34m2025-12-30 05:39:11[39m] (step=0000791) Train Loss mse: 0.0397, Train Loss ce: 0.0636, Train Steps/Sec: 0.07,
|
| 978 |
-
[[34m2025-12-30 05:39:23[39m] (step=0000792) Train Loss mse: 0.0506, Train Loss ce: 0.0581, Train Steps/Sec: 0.09,
|
| 979 |
-
[[34m2025-12-30 05:39:35[39m] (step=0000793) Train Loss mse: 0.0476, Train Loss ce: 0.0624, Train Steps/Sec: 0.08,
|
| 980 |
-
[[34m2025-12-30 05:39:48[39m] (step=0000794) Train Loss mse: 0.0308, Train Loss ce: 0.0605, Train Steps/Sec: 0.07,
|
| 981 |
-
[[34m2025-12-30 05:40:04[39m] (step=0000795) Train Loss mse: 0.0383, Train Loss ce: 0.0697, Train Steps/Sec: 0.06,
|
| 982 |
-
[[34m2025-12-30 05:40:19[39m] (step=0000796) Train Loss mse: 0.0339, Train Loss ce: 0.0695, Train Steps/Sec: 0.07,
|
| 983 |
-
[[34m2025-12-30 05:40:30[39m] (step=0000797) Train Loss mse: 0.0377, Train Loss ce: 0.0688, Train Steps/Sec: 0.09,
|
| 984 |
-
[[34m2025-12-30 05:40:46[39m] (step=0000798) Train Loss mse: 0.0481, Train Loss ce: 0.0675, Train Steps/Sec: 0.06,
|
| 985 |
-
[[34m2025-12-30 05:41:03[39m] (step=0000799) Train Loss mse: 0.0411, Train Loss ce: 0.0655, Train Steps/Sec: 0.06,
|
| 986 |
-
[[34m2025-12-30 05:41:16[39m] (step=0000800) Train Loss mse: 0.0402, Train Loss ce: 0.0631, Train Steps/Sec: 0.07,
|
| 987 |
-
[[34m2025-12-30 05:41:30[39m] (step=0000801) Train Loss mse: 0.0409, Train Loss ce: 0.0616, Train Steps/Sec: 0.07,
|
| 988 |
-
[[34m2025-12-30 05:41:42[39m] (step=0000802) Train Loss mse: 0.0484, Train Loss ce: 0.0610, Train Steps/Sec: 0.08,
|
| 989 |
-
[[34m2025-12-30 05:41:55[39m] (step=0000803) Train Loss mse: 0.0327, Train Loss ce: 0.0618, Train Steps/Sec: 0.08,
|
| 990 |
-
[[34m2025-12-30 05:42:06[39m] (step=0000804) Train Loss mse: 0.0724, Train Loss ce: 0.0654, Train Steps/Sec: 0.10,
|
| 991 |
-
[[34m2025-12-30 05:42:19[39m] (step=0000805) Train Loss mse: 0.0422, Train Loss ce: 0.0623, Train Steps/Sec: 0.07,
|
| 992 |
-
[[34m2025-12-30 05:42:31[39m] (step=0000806) Train Loss mse: 0.0480, Train Loss ce: 0.0572, Train Steps/Sec: 0.08,
|
| 993 |
-
[[34m2025-12-30 05:42:43[39m] (step=0000807) Train Loss mse: 0.0494, Train Loss ce: 0.0594, Train Steps/Sec: 0.08,
|
| 994 |
-
[[34m2025-12-30 05:42:54[39m] (step=0000808) Train Loss mse: 0.0453, Train Loss ce: 0.0614, Train Steps/Sec: 0.09,
|
| 995 |
-
[[34m2025-12-30 05:43:11[39m] (step=0000809) Train Loss mse: 0.0326, Train Loss ce: 0.0636, Train Steps/Sec: 0.06,
|
| 996 |
-
[[34m2025-12-30 05:43:23[39m] (step=0000810) Train Loss mse: 0.0434, Train Loss ce: 0.0585, Train Steps/Sec: 0.08,
|
| 997 |
-
[[34m2025-12-30 05:43:35[39m] (step=0000811) Train Loss mse: 0.0605, Train Loss ce: 0.0614, Train Steps/Sec: 0.08,
|
| 998 |
-
[[34m2025-12-30 05:43:47[39m] (step=0000812) Train Loss mse: 0.0396, Train Loss ce: 0.0593, Train Steps/Sec: 0.09,
|
| 999 |
-
[[34m2025-12-30 05:44:00[39m] (step=0000813) Train Loss mse: 0.0491, Train Loss ce: 0.0637, Train Steps/Sec: 0.07,
|
| 1000 |
-
[[34m2025-12-30 05:44:13[39m] (step=0000814) Train Loss mse: 0.0546, Train Loss ce: 0.0675, Train Steps/Sec: 0.08,
|
| 1001 |
-
[[34m2025-12-30 05:44:29[39m] (step=0000815) Train Loss mse: 0.0476, Train Loss ce: 0.0708, Train Steps/Sec: 0.06,
|
| 1002 |
-
[[34m2025-12-30 05:44:45[39m] (step=0000816) Train Loss mse: 0.0340, Train Loss ce: 0.0607, Train Steps/Sec: 0.06,
|
| 1003 |
-
[[34m2025-12-30 05:45:01[39m] (step=0000817) Train Loss mse: 0.0456, Train Loss ce: 0.0649, Train Steps/Sec: 0.06,
|
| 1004 |
-
[[34m2025-12-30 05:45:15[39m] (step=0000818) Train Loss mse: 0.0350, Train Loss ce: 0.0628, Train Steps/Sec: 0.07,
|
| 1005 |
-
[[34m2025-12-30 05:45:26[39m] (step=0000819) Train Loss mse: 0.0484, Train Loss ce: 0.0635, Train Steps/Sec: 0.09,
|
| 1006 |
-
[[34m2025-12-30 05:45:37[39m] (step=0000820) Train Loss mse: 0.0427, Train Loss ce: 0.0604, Train Steps/Sec: 0.09,
|
| 1007 |
-
[[34m2025-12-30 05:45:50[39m] (step=0000821) Train Loss mse: 0.0280, Train Loss ce: 0.0599, Train Steps/Sec: 0.08,
|
| 1008 |
-
[[34m2025-12-30 05:46:01[39m] (step=0000822) Train Loss mse: 0.0395, Train Loss ce: 0.0650, Train Steps/Sec: 0.09,
|
| 1009 |
-
[[34m2025-12-30 05:46:13[39m] (step=0000823) Train Loss mse: 0.0509, Train Loss ce: 0.0648, Train Steps/Sec: 0.08,
|
| 1010 |
-
[[34m2025-12-30 05:46:29[39m] (step=0000824) Train Loss mse: 0.0542, Train Loss ce: 0.0660, Train Steps/Sec: 0.06,
|
| 1011 |
-
[[34m2025-12-30 05:46:43[39m] (step=0000825) Train Loss mse: 0.0471, Train Loss ce: 0.0679, Train Steps/Sec: 0.07,
|
| 1012 |
-
[[34m2025-12-30 05:46:55[39m] (step=0000826) Train Loss mse: 0.0600, Train Loss ce: 0.0622, Train Steps/Sec: 0.08,
|
| 1013 |
-
[[34m2025-12-30 05:47:05[39m] (step=0000827) Train Loss mse: 0.0479, Train Loss ce: 0.0622, Train Steps/Sec: 0.10,
|
| 1014 |
-
[[34m2025-12-30 05:47:19[39m] (step=0000828) Train Loss mse: 0.0395, Train Loss ce: 0.0603, Train Steps/Sec: 0.07,
|
| 1015 |
-
[[34m2025-12-30 05:47:35[39m] (step=0000829) Train Loss mse: 0.0340, Train Loss ce: 0.0620, Train Steps/Sec: 0.06,
|
| 1016 |
-
[[34m2025-12-30 05:47:46[39m] (step=0000830) Train Loss mse: 0.0480, Train Loss ce: 0.0592, Train Steps/Sec: 0.09,
|
| 1017 |
-
[[34m2025-12-30 05:47:58[39m] (step=0000831) Train Loss mse: 0.0487, Train Loss ce: 0.0607, Train Steps/Sec: 0.08,
|
| 1018 |
-
[[34m2025-12-30 05:48:10[39m] (step=0000832) Train Loss mse: 0.0518, Train Loss ce: 0.0637, Train Steps/Sec: 0.08,
|
| 1019 |
-
[[34m2025-12-30 05:48:26[39m] (step=0000833) Train Loss mse: 0.0428, Train Loss ce: 0.0640, Train Steps/Sec: 0.06,
|
| 1020 |
-
[[34m2025-12-30 05:48:38[39m] (step=0000834) Train Loss mse: 0.0478, Train Loss ce: 0.0733, Train Steps/Sec: 0.09,
|
| 1021 |
-
[[34m2025-12-30 05:48:54[39m] (step=0000835) Train Loss mse: 0.0390, Train Loss ce: 0.0679, Train Steps/Sec: 0.06,
|
| 1022 |
-
[[34m2025-12-30 05:49:07[39m] (step=0000836) Train Loss mse: 0.0372, Train Loss ce: 0.0580, Train Steps/Sec: 0.07,
|
| 1023 |
-
[[34m2025-12-30 05:49:20[39m] (step=0000837) Train Loss mse: 0.0565, Train Loss ce: 0.0645, Train Steps/Sec: 0.08,
|
| 1024 |
-
[[34m2025-12-30 05:49:35[39m] (step=0000838) Train Loss mse: 0.0368, Train Loss ce: 0.0601, Train Steps/Sec: 0.06,
|
| 1025 |
-
[[34m2025-12-30 05:49:49[39m] (step=0000839) Train Loss mse: 0.0374, Train Loss ce: 0.0657, Train Steps/Sec: 0.08,
|
| 1026 |
-
[[34m2025-12-30 05:50:04[39m] (step=0000840) Train Loss mse: 0.0467, Train Loss ce: 0.0665, Train Steps/Sec: 0.06,
|
| 1027 |
-
[[34m2025-12-30 05:50:18[39m] (step=0000841) Train Loss mse: 0.0450, Train Loss ce: 0.0677, Train Steps/Sec: 0.08,
|
| 1028 |
-
[[34m2025-12-30 05:50:30[39m] (step=0000842) Train Loss mse: 0.0410, Train Loss ce: 0.0632, Train Steps/Sec: 0.08,
|
| 1029 |
-
[[34m2025-12-30 05:50:46[39m] (step=0000843) Train Loss mse: 0.0386, Train Loss ce: 0.0618, Train Steps/Sec: 0.06,
|
| 1030 |
-
[[34m2025-12-30 05:51:00[39m] (step=0000844) Train Loss mse: 0.0496, Train Loss ce: 0.0641, Train Steps/Sec: 0.08,
|
| 1031 |
-
[[34m2025-12-30 05:51:14[39m] (step=0000845) Train Loss mse: 0.0314, Train Loss ce: 0.0647, Train Steps/Sec: 0.07,
|
| 1032 |
-
[[34m2025-12-30 05:51:26[39m] (step=0000846) Train Loss mse: 0.0343, Train Loss ce: 0.0591, Train Steps/Sec: 0.08,
|
| 1033 |
-
[[34m2025-12-30 05:51:42[39m] (step=0000847) Train Loss mse: 0.0252, Train Loss ce: 0.0611, Train Steps/Sec: 0.06,
|
| 1034 |
-
[[34m2025-12-30 05:51:55[39m] (step=0000848) Train Loss mse: 0.0322, Train Loss ce: 0.0662, Train Steps/Sec: 0.08,
|
| 1035 |
-
[[34m2025-12-30 05:52:11[39m] (step=0000849) Train Loss mse: 0.0505, Train Loss ce: 0.0663, Train Steps/Sec: 0.06,
|
| 1036 |
-
[[34m2025-12-30 05:52:23[39m] (step=0000850) Train Loss mse: 0.0455, Train Loss ce: 0.0656, Train Steps/Sec: 0.08,
|
| 1037 |
-
[[34m2025-12-30 05:52:36[39m] (step=0000851) Train Loss mse: 0.0462, Train Loss ce: 0.0642, Train Steps/Sec: 0.08,
|
| 1038 |
-
[[34m2025-12-30 05:52:49[39m] (step=0000852) Train Loss mse: 0.0527, Train Loss ce: 0.0714, Train Steps/Sec: 0.08,
|
| 1039 |
-
[[34m2025-12-30 05:53:01[39m] (step=0000853) Train Loss mse: 0.0497, Train Loss ce: 0.0633, Train Steps/Sec: 0.08,
|
| 1040 |
-
[[34m2025-12-30 05:53:17[39m] (step=0000854) Train Loss mse: 0.0413, Train Loss ce: 0.0632, Train Steps/Sec: 0.06,
|
| 1041 |
-
[[34m2025-12-30 05:53:30[39m] (step=0000855) Train Loss mse: 0.0484, Train Loss ce: 0.0659, Train Steps/Sec: 0.07,
|
| 1042 |
-
[[34m2025-12-30 05:53:43[39m] (step=0000856) Train Loss mse: 0.0571, Train Loss ce: 0.0692, Train Steps/Sec: 0.07,
|
| 1043 |
-
[[34m2025-12-30 05:53:57[39m] (step=0000857) Train Loss mse: 0.0312, Train Loss ce: 0.0582, Train Steps/Sec: 0.07,
|
| 1044 |
-
[[34m2025-12-30 05:54:09[39m] (step=0000858) Train Loss mse: 0.0596, Train Loss ce: 0.0589, Train Steps/Sec: 0.08,
|
| 1045 |
-
[[34m2025-12-30 05:54:22[39m] (step=0000859) Train Loss mse: 0.0381, Train Loss ce: 0.0585, Train Steps/Sec: 0.08,
|
| 1046 |
-
[[34m2025-12-30 05:54:36[39m] (step=0000860) Train Loss mse: 0.0362, Train Loss ce: 0.0659, Train Steps/Sec: 0.08,
|
| 1047 |
-
[[34m2025-12-30 05:54:52[39m] (step=0000861) Train Loss mse: 0.0378, Train Loss ce: 0.0605, Train Steps/Sec: 0.06,
|
| 1048 |
-
[[34m2025-12-30 05:55:05[39m] (step=0000862) Train Loss mse: 0.0405, Train Loss ce: 0.0605, Train Steps/Sec: 0.07,
|
| 1049 |
-
[[34m2025-12-30 05:55:18[39m] (step=0000863) Train Loss mse: 0.0501, Train Loss ce: 0.0731, Train Steps/Sec: 0.08,
|
| 1050 |
-
[[34m2025-12-30 05:55:32[39m] (step=0000864) Train Loss mse: 0.0554, Train Loss ce: 0.0656, Train Steps/Sec: 0.07,
|
| 1051 |
-
[[34m2025-12-30 05:55:45[39m] (step=0000865) Train Loss mse: 0.0447, Train Loss ce: 0.0668, Train Steps/Sec: 0.08,
|
| 1052 |
-
[[34m2025-12-30 05:55:58[39m] (step=0000866) Train Loss mse: 0.0281, Train Loss ce: 0.0655, Train Steps/Sec: 0.08,
|
| 1053 |
-
[[34m2025-12-30 05:56:14[39m] (step=0000867) Train Loss mse: 0.0498, Train Loss ce: 0.0637, Train Steps/Sec: 0.06,
|
| 1054 |
-
[[34m2025-12-30 05:56:28[39m] (step=0000868) Train Loss mse: 0.0501, Train Loss ce: 0.0628, Train Steps/Sec: 0.07,
|
| 1055 |
-
[[34m2025-12-30 05:56:41[39m] (step=0000869) Train Loss mse: 0.0460, Train Loss ce: 0.0721, Train Steps/Sec: 0.08,
|
| 1056 |
-
[[34m2025-12-30 05:56:54[39m] (step=0000870) Train Loss mse: 0.0598, Train Loss ce: 0.0687, Train Steps/Sec: 0.08,
|
| 1057 |
-
[[34m2025-12-30 05:57:09[39m] (step=0000871) Train Loss mse: 0.0381, Train Loss ce: 0.0636, Train Steps/Sec: 0.06,
|
| 1058 |
-
[[34m2025-12-30 05:57:26[39m] (step=0000872) Train Loss mse: 0.0296, Train Loss ce: 0.0603, Train Steps/Sec: 0.06,
|
| 1059 |
-
[[34m2025-12-30 05:57:39[39m] (step=0000873) Train Loss mse: 0.0587, Train Loss ce: 0.0701, Train Steps/Sec: 0.07,
|
| 1060 |
-
[[34m2025-12-30 05:57:55[39m] (step=0000874) Train Loss mse: 0.0443, Train Loss ce: 0.0640, Train Steps/Sec: 0.06,
|
| 1061 |
-
[[34m2025-12-30 05:58:11[39m] (step=0000875) Train Loss mse: 0.0375, Train Loss ce: 0.0569, Train Steps/Sec: 0.06,
|
| 1062 |
-
[[34m2025-12-30 05:58:27[39m] (step=0000876) Train Loss mse: 0.0279, Train Loss ce: 0.0632, Train Steps/Sec: 0.06,
|
| 1063 |
-
[[34m2025-12-30 05:58:43[39m] (step=0000877) Train Loss mse: 0.0385, Train Loss ce: 0.0582, Train Steps/Sec: 0.06,
|
| 1064 |
-
[[34m2025-12-30 05:58:59[39m] (step=0000878) Train Loss mse: 0.0368, Train Loss ce: 0.0641, Train Steps/Sec: 0.06,
|
| 1065 |
-
[[34m2025-12-30 05:59:12[39m] (step=0000879) Train Loss mse: 0.0451, Train Loss ce: 0.0623, Train Steps/Sec: 0.07,
|
| 1066 |
-
[[34m2025-12-30 05:59:26[39m] (step=0000880) Train Loss mse: 0.0397, Train Loss ce: 0.0639, Train Steps/Sec: 0.07,
|
| 1067 |
-
[[34m2025-12-30 05:59:40[39m] (step=0000881) Train Loss mse: 0.0532, Train Loss ce: 0.0666, Train Steps/Sec: 0.07,
|
| 1068 |
-
[[34m2025-12-30 05:59:53[39m] (step=0000882) Train Loss mse: 0.0452, Train Loss ce: 0.0636, Train Steps/Sec: 0.08,
|
| 1069 |
-
[[34m2025-12-30 06:00:07[39m] (step=0000883) Train Loss mse: 0.0423, Train Loss ce: 0.0629, Train Steps/Sec: 0.08,
|
| 1070 |
-
[[34m2025-12-30 06:00:19[39m] (step=0000884) Train Loss mse: 0.0447, Train Loss ce: 0.0672, Train Steps/Sec: 0.09,
|
| 1071 |
[[34m2025-12-30 06:00:30[39m] (step=0000885) Train Loss mse: 0.0446, Train Loss ce: 0.0648, Train Steps/Sec: 0.09,
|
| 1072 |
[[34m2025-12-30 06:00:43[39m] (step=0000886) Train Loss mse: 0.0273, Train Loss ce: 0.0610, Train Steps/Sec: 0.08,
|
| 1073 |
[[34m2025-12-30 06:00:56[39m] (step=0000887) Train Loss mse: 0.0444, Train Loss ce: 0.0651, Train Steps/Sec: 0.08,
|
|
|
|
| 799 |
[[34m2025-12-30 05:38:29[39m] (step=0000788) Train Loss mse: 0.0482, Train Loss ce: 0.0750, Train Steps/Sec: 0.09,
|
| 800 |
[[34m2025-12-30 05:38:45[39m] (step=0000789) Train Loss mse: 0.0491, Train Loss ce: 0.0644, Train Steps/Sec: 0.06,
|
| 801 |
[[34m2025-12-30 05:38:58[39m] (step=0000790) Train Loss mse: 0.0478, Train Loss ce: 0.0628, Train Steps/Sec: 0.08,
|
| 802 |
+
[[34m2025-12-30 05:39:11[39m] (step=0000791) Train Loss mse: 0.0397, Train Loss ce: 0.0636, Train Steps/Sec: 0.07,
|
| 803 |
+
[[34m2025-12-30 05:39:23[39m] (step=0000792) Train Loss mse: 0.0506, Train Loss ce: 0.0581, Train Steps/Sec: 0.09,
|
| 804 |
+
[[34m2025-12-30 05:39:35[39m] (step=0000793) Train Loss mse: 0.0476, Train Loss ce: 0.0624, Train Steps/Sec: 0.08,
|
| 805 |
+
[[34m2025-12-30 05:39:48[39m] (step=0000794) Train Loss mse: 0.0308, Train Loss ce: 0.0605, Train Steps/Sec: 0.07,
|
| 806 |
+
[[34m2025-12-30 05:40:04[39m] (step=0000795) Train Loss mse: 0.0383, Train Loss ce: 0.0697, Train Steps/Sec: 0.06,
|
| 807 |
+
[[34m2025-12-30 05:40:19[39m] (step=0000796) Train Loss mse: 0.0339, Train Loss ce: 0.0695, Train Steps/Sec: 0.07,
|
| 808 |
+
[[34m2025-12-30 05:40:30[39m] (step=0000797) Train Loss mse: 0.0377, Train Loss ce: 0.0688, Train Steps/Sec: 0.09,
|
| 809 |
+
[[34m2025-12-30 05:40:46[39m] (step=0000798) Train Loss mse: 0.0481, Train Loss ce: 0.0675, Train Steps/Sec: 0.06,
|
| 810 |
+
[[34m2025-12-30 05:41:03[39m] (step=0000799) Train Loss mse: 0.0411, Train Loss ce: 0.0655, Train Steps/Sec: 0.06,
|
| 811 |
+
[[34m2025-12-30 05:41:16[39m] (step=0000800) Train Loss mse: 0.0402, Train Loss ce: 0.0631, Train Steps/Sec: 0.07,
|
| 812 |
+
[[34m2025-12-30 05:41:30[39m] (step=0000801) Train Loss mse: 0.0409, Train Loss ce: 0.0616, Train Steps/Sec: 0.07,
|
| 813 |
+
[[34m2025-12-30 05:41:42[39m] (step=0000802) Train Loss mse: 0.0484, Train Loss ce: 0.0610, Train Steps/Sec: 0.08,
|
| 814 |
+
[[34m2025-12-30 05:41:55[39m] (step=0000803) Train Loss mse: 0.0327, Train Loss ce: 0.0618, Train Steps/Sec: 0.08,
|
| 815 |
+
[[34m2025-12-30 05:42:06[39m] (step=0000804) Train Loss mse: 0.0724, Train Loss ce: 0.0654, Train Steps/Sec: 0.10,
|
| 816 |
+
[[34m2025-12-30 05:42:19[39m] (step=0000805) Train Loss mse: 0.0422, Train Loss ce: 0.0623, Train Steps/Sec: 0.07,
|
| 817 |
+
[[34m2025-12-30 05:42:31[39m] (step=0000806) Train Loss mse: 0.0480, Train Loss ce: 0.0572, Train Steps/Sec: 0.08,
|
| 818 |
+
[[34m2025-12-30 05:42:43[39m] (step=0000807) Train Loss mse: 0.0494, Train Loss ce: 0.0594, Train Steps/Sec: 0.08,
|
| 819 |
+
[[34m2025-12-30 05:42:54[39m] (step=0000808) Train Loss mse: 0.0453, Train Loss ce: 0.0614, Train Steps/Sec: 0.09,
|
| 820 |
+
[[34m2025-12-30 05:43:11[39m] (step=0000809) Train Loss mse: 0.0326, Train Loss ce: 0.0636, Train Steps/Sec: 0.06,
|
| 821 |
+
[[34m2025-12-30 05:43:23[39m] (step=0000810) Train Loss mse: 0.0434, Train Loss ce: 0.0585, Train Steps/Sec: 0.08,
|
| 822 |
+
[[34m2025-12-30 05:43:35[39m] (step=0000811) Train Loss mse: 0.0605, Train Loss ce: 0.0614, Train Steps/Sec: 0.08,
|
| 823 |
+
[[34m2025-12-30 05:43:47[39m] (step=0000812) Train Loss mse: 0.0396, Train Loss ce: 0.0593, Train Steps/Sec: 0.09,
|
| 824 |
+
[[34m2025-12-30 05:44:00[39m] (step=0000813) Train Loss mse: 0.0491, Train Loss ce: 0.0637, Train Steps/Sec: 0.07,
|
| 825 |
+
[[34m2025-12-30 05:44:13[39m] (step=0000814) Train Loss mse: 0.0546, Train Loss ce: 0.0675, Train Steps/Sec: 0.08,
|
| 826 |
+
[[34m2025-12-30 05:44:29[39m] (step=0000815) Train Loss mse: 0.0476, Train Loss ce: 0.0708, Train Steps/Sec: 0.06,
|
| 827 |
+
[[34m2025-12-30 05:44:45[39m] (step=0000816) Train Loss mse: 0.0340, Train Loss ce: 0.0607, Train Steps/Sec: 0.06,
|
| 828 |
+
[[34m2025-12-30 05:45:01[39m] (step=0000817) Train Loss mse: 0.0456, Train Loss ce: 0.0649, Train Steps/Sec: 0.06,
|
| 829 |
+
[[34m2025-12-30 05:45:15[39m] (step=0000818) Train Loss mse: 0.0350, Train Loss ce: 0.0628, Train Steps/Sec: 0.07,
|
| 830 |
+
[[34m2025-12-30 05:45:26[39m] (step=0000819) Train Loss mse: 0.0484, Train Loss ce: 0.0635, Train Steps/Sec: 0.09,
|
| 831 |
+
[[34m2025-12-30 05:45:37[39m] (step=0000820) Train Loss mse: 0.0427, Train Loss ce: 0.0604, Train Steps/Sec: 0.09,
|
| 832 |
+
[[34m2025-12-30 05:45:50[39m] (step=0000821) Train Loss mse: 0.0280, Train Loss ce: 0.0599, Train Steps/Sec: 0.08,
|
| 833 |
+
[[34m2025-12-30 05:46:01[39m] (step=0000822) Train Loss mse: 0.0395, Train Loss ce: 0.0650, Train Steps/Sec: 0.09,
|
| 834 |
+
[[34m2025-12-30 05:46:13[39m] (step=0000823) Train Loss mse: 0.0509, Train Loss ce: 0.0648, Train Steps/Sec: 0.08,
|
| 835 |
+
[[34m2025-12-30 05:46:29[39m] (step=0000824) Train Loss mse: 0.0542, Train Loss ce: 0.0660, Train Steps/Sec: 0.06,
|
| 836 |
+
[[34m2025-12-30 05:46:43[39m] (step=0000825) Train Loss mse: 0.0471, Train Loss ce: 0.0679, Train Steps/Sec: 0.07,
|
| 837 |
+
[[34m2025-12-30 05:46:55[39m] (step=0000826) Train Loss mse: 0.0600, Train Loss ce: 0.0622, Train Steps/Sec: 0.08,
|
| 838 |
+
[[34m2025-12-30 05:47:05[39m] (step=0000827) Train Loss mse: 0.0479, Train Loss ce: 0.0622, Train Steps/Sec: 0.10,
|
| 839 |
+
[[34m2025-12-30 05:47:19[39m] (step=0000828) Train Loss mse: 0.0395, Train Loss ce: 0.0603, Train Steps/Sec: 0.07,
|
| 840 |
+
[[34m2025-12-30 05:47:35[39m] (step=0000829) Train Loss mse: 0.0340, Train Loss ce: 0.0620, Train Steps/Sec: 0.06,
|
| 841 |
+
[[34m2025-12-30 05:47:46[39m] (step=0000830) Train Loss mse: 0.0480, Train Loss ce: 0.0592, Train Steps/Sec: 0.09,
|
| 842 |
+
[[34m2025-12-30 05:47:58[39m] (step=0000831) Train Loss mse: 0.0487, Train Loss ce: 0.0607, Train Steps/Sec: 0.08,
|
| 843 |
+
[[34m2025-12-30 05:48:10[39m] (step=0000832) Train Loss mse: 0.0518, Train Loss ce: 0.0637, Train Steps/Sec: 0.08,
|
| 844 |
+
[[34m2025-12-30 05:48:26[39m] (step=0000833) Train Loss mse: 0.0428, Train Loss ce: 0.0640, Train Steps/Sec: 0.06,
|
| 845 |
+
[[34m2025-12-30 05:48:38[39m] (step=0000834) Train Loss mse: 0.0478, Train Loss ce: 0.0733, Train Steps/Sec: 0.09,
|
| 846 |
+
[[34m2025-12-30 05:48:54[39m] (step=0000835) Train Loss mse: 0.0390, Train Loss ce: 0.0679, Train Steps/Sec: 0.06,
|
| 847 |
+
[[34m2025-12-30 05:49:07[39m] (step=0000836) Train Loss mse: 0.0372, Train Loss ce: 0.0580, Train Steps/Sec: 0.07,
|
| 848 |
+
[[34m2025-12-30 05:49:20[39m] (step=0000837) Train Loss mse: 0.0565, Train Loss ce: 0.0645, Train Steps/Sec: 0.08,
|
| 849 |
+
[[34m2025-12-30 05:49:35[39m] (step=0000838) Train Loss mse: 0.0368, Train Loss ce: 0.0601, Train Steps/Sec: 0.06,
|
| 850 |
+
[[34m2025-12-30 05:49:49[39m] (step=0000839) Train Loss mse: 0.0374, Train Loss ce: 0.0657, Train Steps/Sec: 0.08,
|
| 851 |
+
[[34m2025-12-30 05:50:04[39m] (step=0000840) Train Loss mse: 0.0467, Train Loss ce: 0.0665, Train Steps/Sec: 0.06,
|
| 852 |
+
[[34m2025-12-30 05:50:18[39m] (step=0000841) Train Loss mse: 0.0450, Train Loss ce: 0.0677, Train Steps/Sec: 0.08,
|
| 853 |
+
[[34m2025-12-30 05:50:30[39m] (step=0000842) Train Loss mse: 0.0410, Train Loss ce: 0.0632, Train Steps/Sec: 0.08,
|
| 854 |
+
[[34m2025-12-30 05:50:46[39m] (step=0000843) Train Loss mse: 0.0386, Train Loss ce: 0.0618, Train Steps/Sec: 0.06,
|
| 855 |
+
[[34m2025-12-30 05:51:00[39m] (step=0000844) Train Loss mse: 0.0496, Train Loss ce: 0.0641, Train Steps/Sec: 0.08,
|
| 856 |
+
[[34m2025-12-30 05:51:14[39m] (step=0000845) Train Loss mse: 0.0314, Train Loss ce: 0.0647, Train Steps/Sec: 0.07,
|
| 857 |
+
[[34m2025-12-30 05:51:26[39m] (step=0000846) Train Loss mse: 0.0343, Train Loss ce: 0.0591, Train Steps/Sec: 0.08,
|
| 858 |
+
[[34m2025-12-30 05:51:42[39m] (step=0000847) Train Loss mse: 0.0252, Train Loss ce: 0.0611, Train Steps/Sec: 0.06,
|
| 859 |
+
[[34m2025-12-30 05:51:55[39m] (step=0000848) Train Loss mse: 0.0322, Train Loss ce: 0.0662, Train Steps/Sec: 0.08,
|
| 860 |
+
[[34m2025-12-30 05:52:11[39m] (step=0000849) Train Loss mse: 0.0505, Train Loss ce: 0.0663, Train Steps/Sec: 0.06,
|
| 861 |
+
[[34m2025-12-30 05:52:23[39m] (step=0000850) Train Loss mse: 0.0455, Train Loss ce: 0.0656, Train Steps/Sec: 0.08,
|
| 862 |
+
[[34m2025-12-30 05:52:36[39m] (step=0000851) Train Loss mse: 0.0462, Train Loss ce: 0.0642, Train Steps/Sec: 0.08,
|
| 863 |
+
[[34m2025-12-30 05:52:49[39m] (step=0000852) Train Loss mse: 0.0527, Train Loss ce: 0.0714, Train Steps/Sec: 0.08,
|
| 864 |
+
[[34m2025-12-30 05:53:01[39m] (step=0000853) Train Loss mse: 0.0497, Train Loss ce: 0.0633, Train Steps/Sec: 0.08,
|
| 865 |
+
[[34m2025-12-30 05:53:17[39m] (step=0000854) Train Loss mse: 0.0413, Train Loss ce: 0.0632, Train Steps/Sec: 0.06,
|
| 866 |
+
[[34m2025-12-30 05:53:30[39m] (step=0000855) Train Loss mse: 0.0484, Train Loss ce: 0.0659, Train Steps/Sec: 0.07,
|
| 867 |
+
[[34m2025-12-30 05:53:43[39m] (step=0000856) Train Loss mse: 0.0571, Train Loss ce: 0.0692, Train Steps/Sec: 0.07,
|
| 868 |
+
[[34m2025-12-30 05:53:57[39m] (step=0000857) Train Loss mse: 0.0312, Train Loss ce: 0.0582, Train Steps/Sec: 0.07,
|
| 869 |
+
[[34m2025-12-30 05:54:09[39m] (step=0000858) Train Loss mse: 0.0596, Train Loss ce: 0.0589, Train Steps/Sec: 0.08,
|
| 870 |
+
[[34m2025-12-30 05:54:22[39m] (step=0000859) Train Loss mse: 0.0381, Train Loss ce: 0.0585, Train Steps/Sec: 0.08,
|
| 871 |
+
[[34m2025-12-30 05:54:36[39m] (step=0000860) Train Loss mse: 0.0362, Train Loss ce: 0.0659, Train Steps/Sec: 0.08,
|
| 872 |
+
[[34m2025-12-30 05:54:52[39m] (step=0000861) Train Loss mse: 0.0378, Train Loss ce: 0.0605, Train Steps/Sec: 0.06,
|
| 873 |
+
[[34m2025-12-30 05:55:05[39m] (step=0000862) Train Loss mse: 0.0405, Train Loss ce: 0.0605, Train Steps/Sec: 0.07,
|
| 874 |
+
[[34m2025-12-30 05:55:18[39m] (step=0000863) Train Loss mse: 0.0501, Train Loss ce: 0.0731, Train Steps/Sec: 0.08,
|
| 875 |
+
[[34m2025-12-30 05:55:32[39m] (step=0000864) Train Loss mse: 0.0554, Train Loss ce: 0.0656, Train Steps/Sec: 0.07,
|
| 876 |
+
[[34m2025-12-30 05:55:45[39m] (step=0000865) Train Loss mse: 0.0447, Train Loss ce: 0.0668, Train Steps/Sec: 0.08,
|
| 877 |
+
[[34m2025-12-30 05:55:58[39m] (step=0000866) Train Loss mse: 0.0281, Train Loss ce: 0.0655, Train Steps/Sec: 0.08,
|
| 878 |
+
[[34m2025-12-30 05:56:14[39m] (step=0000867) Train Loss mse: 0.0498, Train Loss ce: 0.0637, Train Steps/Sec: 0.06,
|
| 879 |
+
[[34m2025-12-30 05:56:28[39m] (step=0000868) Train Loss mse: 0.0501, Train Loss ce: 0.0628, Train Steps/Sec: 0.07,
|
| 880 |
+
[[34m2025-12-30 05:56:41[39m] (step=0000869) Train Loss mse: 0.0460, Train Loss ce: 0.0721, Train Steps/Sec: 0.08,
|
| 881 |
+
[[34m2025-12-30 05:56:54[39m] (step=0000870) Train Loss mse: 0.0598, Train Loss ce: 0.0687, Train Steps/Sec: 0.08,
|
| 882 |
+
[[34m2025-12-30 05:57:09[39m] (step=0000871) Train Loss mse: 0.0381, Train Loss ce: 0.0636, Train Steps/Sec: 0.06,
|
| 883 |
+
[[34m2025-12-30 05:57:26[39m] (step=0000872) Train Loss mse: 0.0296, Train Loss ce: 0.0603, Train Steps/Sec: 0.06,
|
| 884 |
+
[[34m2025-12-30 05:57:39[39m] (step=0000873) Train Loss mse: 0.0587, Train Loss ce: 0.0701, Train Steps/Sec: 0.07,
|
| 885 |
+
[[34m2025-12-30 05:57:55[39m] (step=0000874) Train Loss mse: 0.0443, Train Loss ce: 0.0640, Train Steps/Sec: 0.06,
|
| 886 |
+
[[34m2025-12-30 05:58:11[39m] (step=0000875) Train Loss mse: 0.0375, Train Loss ce: 0.0569, Train Steps/Sec: 0.06,
|
| 887 |
+
[[34m2025-12-30 05:58:27[39m] (step=0000876) Train Loss mse: 0.0279, Train Loss ce: 0.0632, Train Steps/Sec: 0.06,
|
| 888 |
+
[[34m2025-12-30 05:58:43[39m] (step=0000877) Train Loss mse: 0.0385, Train Loss ce: 0.0582, Train Steps/Sec: 0.06,
|
| 889 |
+
[[34m2025-12-30 05:58:59[39m] (step=0000878) Train Loss mse: 0.0368, Train Loss ce: 0.0641, Train Steps/Sec: 0.06,
|
| 890 |
+
[[34m2025-12-30 05:59:12[39m] (step=0000879) Train Loss mse: 0.0451, Train Loss ce: 0.0623, Train Steps/Sec: 0.07,
|
| 891 |
+
[[34m2025-12-30 05:59:26[39m] (step=0000880) Train Loss mse: 0.0397, Train Loss ce: 0.0639, Train Steps/Sec: 0.07,
|
| 892 |
+
[[34m2025-12-30 05:59:40[39m] (step=0000881) Train Loss mse: 0.0532, Train Loss ce: 0.0666, Train Steps/Sec: 0.07,
|
| 893 |
+
[[34m2025-12-30 05:59:53[39m] (step=0000882) Train Loss mse: 0.0452, Train Loss ce: 0.0636, Train Steps/Sec: 0.08,
|
| 894 |
+
[[34m2025-12-30 06:00:07[39m] (step=0000883) Train Loss mse: 0.0423, Train Loss ce: 0.0629, Train Steps/Sec: 0.08,
|
| 895 |
+
[[34m2025-12-30 06:00:19[39m] (step=0000884) Train Loss mse: 0.0447, Train Loss ce: 0.0672, Train Steps/Sec: 0.09,
|
| 896 |
FullyShardedDataParallel(
|
| 897 |
(_fsdp_wrapped_module): Bagel(
|
| 898 |
(language_model): Qwen2ForCausalLM(
|
|
|
|
| 1068 |
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
| 1069 |
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_val
|
| 1070 |
Warning: failed loading hashes from /home/jiaxin/bagel_train/hashes_test_set_v10.json: [Errno 2] No such file or directory: '/home/jiaxin/bagel_train/hashes_test_set_v10.json'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1071 |
[[34m2025-12-30 06:00:30[39m] (step=0000885) Train Loss mse: 0.0446, Train Loss ce: 0.0648, Train Steps/Sec: 0.09,
|
| 1072 |
[[34m2025-12-30 06:00:43[39m] (step=0000886) Train Loss mse: 0.0273, Train Loss ce: 0.0610, Train Steps/Sec: 0.08,
|
| 1073 |
[[34m2025-12-30 06:00:56[39m] (step=0000887) Train Loss mse: 0.0444, Train Loss ce: 0.0651, Train Steps/Sec: 0.08,
|
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20251230_024051-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log
CHANGED
|
@@ -872,6 +872,24 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 872 |
[[34m2025-12-30 06:05:11[39m] (step=0000861) Train Loss mse: 0.0378, Train Loss ce: 0.0571, Train Steps/Sec: 0.06,
|
| 873 |
[[34m2025-12-30 06:05:25[39m] (step=0000862) Train Loss mse: 0.0404, Train Loss ce: 0.0582, Train Steps/Sec: 0.07,
|
| 874 |
[[34m2025-12-30 06:05:37[39m] (step=0000863) Train Loss mse: 0.0498, Train Loss ce: 0.0737, Train Steps/Sec: 0.08,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 875 |
FullyShardedDataParallel(
|
| 876 |
(_fsdp_wrapped_module): Bagel(
|
| 877 |
(language_model): Qwen2ForCausalLM(
|
|
@@ -1045,24 +1063,6 @@ connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
|
| 1045 |
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 1046 |
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 1047 |
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_val
|
| 1048 |
-
[[34m2025-12-30 06:05:51[39m] (step=0000864) Train Loss mse: 0.0551, Train Loss ce: 0.0653, Train Steps/Sec: 0.07,
|
| 1049 |
-
[[34m2025-12-30 06:06:04[39m] (step=0000865) Train Loss mse: 0.0448, Train Loss ce: 0.0661, Train Steps/Sec: 0.08,
|
| 1050 |
-
[[34m2025-12-30 06:06:17[39m] (step=0000866) Train Loss mse: 0.0281, Train Loss ce: 0.0686, Train Steps/Sec: 0.08,
|
| 1051 |
-
[[34m2025-12-30 06:06:34[39m] (step=0000867) Train Loss mse: 0.0497, Train Loss ce: 0.0639, Train Steps/Sec: 0.06,
|
| 1052 |
-
[[34m2025-12-30 06:06:47[39m] (step=0000868) Train Loss mse: 0.0499, Train Loss ce: 0.0637, Train Steps/Sec: 0.07,
|
| 1053 |
-
[[34m2025-12-30 06:07:00[39m] (step=0000869) Train Loss mse: 0.0456, Train Loss ce: 0.0727, Train Steps/Sec: 0.08,
|
| 1054 |
-
[[34m2025-12-30 06:07:13[39m] (step=0000870) Train Loss mse: 0.0586, Train Loss ce: 0.0668, Train Steps/Sec: 0.08,
|
| 1055 |
-
[[34m2025-12-30 06:07:29[39m] (step=0000871) Train Loss mse: 0.0382, Train Loss ce: 0.0622, Train Steps/Sec: 0.06,
|
| 1056 |
-
[[34m2025-12-30 06:07:45[39m] (step=0000872) Train Loss mse: 0.0296, Train Loss ce: 0.0591, Train Steps/Sec: 0.06,
|
| 1057 |
-
[[34m2025-12-30 06:07:59[39m] (step=0000873) Train Loss mse: 0.0595, Train Loss ce: 0.0675, Train Steps/Sec: 0.07,
|
| 1058 |
-
[[34m2025-12-30 06:08:15[39m] (step=0000874) Train Loss mse: 0.0444, Train Loss ce: 0.0658, Train Steps/Sec: 0.06,
|
| 1059 |
-
[[34m2025-12-30 06:08:31[39m] (step=0000875) Train Loss mse: 0.0382, Train Loss ce: 0.0611, Train Steps/Sec: 0.06,
|
| 1060 |
-
[[34m2025-12-30 06:08:47[39m] (step=0000876) Train Loss mse: 0.0280, Train Loss ce: 0.0634, Train Steps/Sec: 0.06,
|
| 1061 |
-
[[34m2025-12-30 06:09:03[39m] (step=0000877) Train Loss mse: 0.0387, Train Loss ce: 0.0618, Train Steps/Sec: 0.06,
|
| 1062 |
-
[[34m2025-12-30 06:09:19[39m] (step=0000878) Train Loss mse: 0.0368, Train Loss ce: 0.0598, Train Steps/Sec: 0.06,
|
| 1063 |
-
[[34m2025-12-30 06:09:32[39m] (step=0000879) Train Loss mse: 0.0453, Train Loss ce: 0.0626, Train Steps/Sec: 0.08,
|
| 1064 |
-
[[34m2025-12-30 06:09:45[39m] (step=0000880) Train Loss mse: 0.0401, Train Loss ce: 0.0609, Train Steps/Sec: 0.08,
|
| 1065 |
-
[[34m2025-12-30 06:10:00[39m] (step=0000881) Train Loss mse: 0.0515, Train Loss ce: 0.0631, Train Steps/Sec: 0.07,
|
| 1066 |
[[34m2025-12-30 06:10:13[39m] (step=0000882) Train Loss mse: 0.0459, Train Loss ce: 0.0686, Train Steps/Sec: 0.07,
|
| 1067 |
[[34m2025-12-30 06:10:27[39m] (step=0000883) Train Loss mse: 0.0424, Train Loss ce: 0.0605, Train Steps/Sec: 0.08,
|
| 1068 |
[[34m2025-12-30 06:10:38[39m] (step=0000884) Train Loss mse: 0.0447, Train Loss ce: 0.0630, Train Steps/Sec: 0.09,
|
|
|
|
| 872 |
[[34m2025-12-30 06:05:11[39m] (step=0000861) Train Loss mse: 0.0378, Train Loss ce: 0.0571, Train Steps/Sec: 0.06,
|
| 873 |
[[34m2025-12-30 06:05:25[39m] (step=0000862) Train Loss mse: 0.0404, Train Loss ce: 0.0582, Train Steps/Sec: 0.07,
|
| 874 |
[[34m2025-12-30 06:05:37[39m] (step=0000863) Train Loss mse: 0.0498, Train Loss ce: 0.0737, Train Steps/Sec: 0.08,
|
| 875 |
+
[[34m2025-12-30 06:05:51[39m] (step=0000864) Train Loss mse: 0.0551, Train Loss ce: 0.0653, Train Steps/Sec: 0.07,
|
| 876 |
+
[[34m2025-12-30 06:06:04[39m] (step=0000865) Train Loss mse: 0.0448, Train Loss ce: 0.0661, Train Steps/Sec: 0.08,
|
| 877 |
+
[[34m2025-12-30 06:06:17[39m] (step=0000866) Train Loss mse: 0.0281, Train Loss ce: 0.0686, Train Steps/Sec: 0.08,
|
| 878 |
+
[[34m2025-12-30 06:06:34[39m] (step=0000867) Train Loss mse: 0.0497, Train Loss ce: 0.0639, Train Steps/Sec: 0.06,
|
| 879 |
+
[[34m2025-12-30 06:06:47[39m] (step=0000868) Train Loss mse: 0.0499, Train Loss ce: 0.0637, Train Steps/Sec: 0.07,
|
| 880 |
+
[[34m2025-12-30 06:07:00[39m] (step=0000869) Train Loss mse: 0.0456, Train Loss ce: 0.0727, Train Steps/Sec: 0.08,
|
| 881 |
+
[[34m2025-12-30 06:07:13[39m] (step=0000870) Train Loss mse: 0.0586, Train Loss ce: 0.0668, Train Steps/Sec: 0.08,
|
| 882 |
+
[[34m2025-12-30 06:07:29[39m] (step=0000871) Train Loss mse: 0.0382, Train Loss ce: 0.0622, Train Steps/Sec: 0.06,
|
| 883 |
+
[[34m2025-12-30 06:07:45[39m] (step=0000872) Train Loss mse: 0.0296, Train Loss ce: 0.0591, Train Steps/Sec: 0.06,
|
| 884 |
+
[[34m2025-12-30 06:07:59[39m] (step=0000873) Train Loss mse: 0.0595, Train Loss ce: 0.0675, Train Steps/Sec: 0.07,
|
| 885 |
+
[[34m2025-12-30 06:08:15[39m] (step=0000874) Train Loss mse: 0.0444, Train Loss ce: 0.0658, Train Steps/Sec: 0.06,
|
| 886 |
+
[[34m2025-12-30 06:08:31[39m] (step=0000875) Train Loss mse: 0.0382, Train Loss ce: 0.0611, Train Steps/Sec: 0.06,
|
| 887 |
+
[[34m2025-12-30 06:08:47[39m] (step=0000876) Train Loss mse: 0.0280, Train Loss ce: 0.0634, Train Steps/Sec: 0.06,
|
| 888 |
+
[[34m2025-12-30 06:09:03[39m] (step=0000877) Train Loss mse: 0.0387, Train Loss ce: 0.0618, Train Steps/Sec: 0.06,
|
| 889 |
+
[[34m2025-12-30 06:09:19[39m] (step=0000878) Train Loss mse: 0.0368, Train Loss ce: 0.0598, Train Steps/Sec: 0.06,
|
| 890 |
+
[[34m2025-12-30 06:09:32[39m] (step=0000879) Train Loss mse: 0.0453, Train Loss ce: 0.0626, Train Steps/Sec: 0.08,
|
| 891 |
+
[[34m2025-12-30 06:09:45[39m] (step=0000880) Train Loss mse: 0.0401, Train Loss ce: 0.0609, Train Steps/Sec: 0.08,
|
| 892 |
+
[[34m2025-12-30 06:10:00[39m] (step=0000881) Train Loss mse: 0.0515, Train Loss ce: 0.0631, Train Steps/Sec: 0.07,
|
| 893 |
FullyShardedDataParallel(
|
| 894 |
(_fsdp_wrapped_module): Bagel(
|
| 895 |
(language_model): Qwen2ForCausalLM(
|
|
|
|
| 1063 |
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 1064 |
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 1065 |
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_val
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1066 |
[[34m2025-12-30 06:10:13[39m] (step=0000882) Train Loss mse: 0.0459, Train Loss ce: 0.0686, Train Steps/Sec: 0.07,
|
| 1067 |
[[34m2025-12-30 06:10:27[39m] (step=0000883) Train Loss mse: 0.0424, Train Loss ce: 0.0605, Train Steps/Sec: 0.08,
|
| 1068 |
[[34m2025-12-30 06:10:38[39m] (step=0000884) Train Loss mse: 0.0447, Train Loss ce: 0.0630, Train Steps/Sec: 0.09,
|
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260104_093254-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log
CHANGED
|
@@ -1,183 +1,3 @@
|
|
| 1 |
-
FullyShardedDataParallel(
|
| 2 |
-
(_fsdp_wrapped_module): Bagel(
|
| 3 |
-
(language_model): Qwen2ForCausalLM(
|
| 4 |
-
(model): Qwen2Model(
|
| 5 |
-
(embed_tokens): Embedding(152064, 3584)
|
| 6 |
-
(layers): ModuleList(
|
| 7 |
-
(0-27): 28 x FullyShardedDataParallel(
|
| 8 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 9 |
-
(_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
|
| 10 |
-
(self_attn): PackedAttentionMoT(
|
| 11 |
-
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
|
| 12 |
-
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 13 |
-
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 14 |
-
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
|
| 15 |
-
(q_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 16 |
-
(k_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 17 |
-
(q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 18 |
-
(k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 19 |
-
(q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
|
| 20 |
-
(k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 21 |
-
(v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 22 |
-
(o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
|
| 23 |
-
)
|
| 24 |
-
(mlp): Qwen2MLP(
|
| 25 |
-
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 26 |
-
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 27 |
-
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 28 |
-
(act_fn): SiLU()
|
| 29 |
-
)
|
| 30 |
-
(mlp_moe_gen): Qwen2MLP(
|
| 31 |
-
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 32 |
-
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 33 |
-
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 34 |
-
(act_fn): SiLU()
|
| 35 |
-
)
|
| 36 |
-
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 37 |
-
(input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 38 |
-
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 39 |
-
(post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 40 |
-
)
|
| 41 |
-
)
|
| 42 |
-
)
|
| 43 |
-
)
|
| 44 |
-
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 45 |
-
(norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 46 |
-
(rotary_emb): Qwen2RotaryEmbedding()
|
| 47 |
-
)
|
| 48 |
-
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
|
| 49 |
-
)
|
| 50 |
-
(time_embedder): FullyShardedDataParallel(
|
| 51 |
-
(_fsdp_wrapped_module): TimestepEmbedder(
|
| 52 |
-
(mlp): Sequential(
|
| 53 |
-
(0): Linear(in_features=256, out_features=3584, bias=True)
|
| 54 |
-
(1): SiLU()
|
| 55 |
-
(2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 56 |
-
)
|
| 57 |
-
)
|
| 58 |
-
)
|
| 59 |
-
(vae2llm): Linear(in_features=64, out_features=3584, bias=True)
|
| 60 |
-
(llm2vae): Linear(in_features=3584, out_features=64, bias=True)
|
| 61 |
-
(latent_pos_embed): FullyShardedDataParallel(
|
| 62 |
-
(_fsdp_wrapped_module): PositionEmbedding()
|
| 63 |
-
)
|
| 64 |
-
(vit_model): SiglipVisionModel(
|
| 65 |
-
(vision_model): FullyShardedDataParallel(
|
| 66 |
-
(_fsdp_wrapped_module): SiglipVisionTransformer(
|
| 67 |
-
(embeddings): SiglipVisionEmbeddings(
|
| 68 |
-
(position_embedding): Embedding(4900, 1152)
|
| 69 |
-
(patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
|
| 70 |
-
)
|
| 71 |
-
(encoder): SiglipEncoder(
|
| 72 |
-
(layers): ModuleList(
|
| 73 |
-
(0-25): 26 x FullyShardedDataParallel(
|
| 74 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 75 |
-
(_checkpoint_wrapped_module): SiglipEncoderLayer(
|
| 76 |
-
(self_attn): SiglipFlashAttention2(
|
| 77 |
-
(k_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 78 |
-
(v_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 79 |
-
(q_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 80 |
-
(out_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 81 |
-
)
|
| 82 |
-
(layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 83 |
-
(mlp): SiglipMLP(
|
| 84 |
-
(activation_fn): PytorchGELUTanh()
|
| 85 |
-
(fc1): Linear(in_features=1152, out_features=4304, bias=True)
|
| 86 |
-
(fc2): Linear(in_features=4304, out_features=1152, bias=True)
|
| 87 |
-
)
|
| 88 |
-
(layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 89 |
-
)
|
| 90 |
-
)
|
| 91 |
-
)
|
| 92 |
-
)
|
| 93 |
-
)
|
| 94 |
-
(post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 95 |
-
)
|
| 96 |
-
)
|
| 97 |
-
)
|
| 98 |
-
(connector): FullyShardedDataParallel(
|
| 99 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 100 |
-
(_checkpoint_wrapped_module): MLPconnector(
|
| 101 |
-
(activation_fn): PytorchGELUTanh()
|
| 102 |
-
(fc1): Linear(in_features=1152, out_features=3584, bias=True)
|
| 103 |
-
(fc2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 104 |
-
)
|
| 105 |
-
)
|
| 106 |
-
)
|
| 107 |
-
(vit_pos_embed): FullyShardedDataParallel(
|
| 108 |
-
(_fsdp_wrapped_module): PositionEmbedding()
|
| 109 |
-
)
|
| 110 |
-
)
|
| 111 |
-
)
|
| 112 |
-
_flat_param True
|
| 113 |
-
language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 114 |
-
language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 115 |
-
language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 116 |
-
language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 117 |
-
language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 118 |
-
language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 119 |
-
language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 120 |
-
language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 121 |
-
language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 122 |
-
language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 123 |
-
language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 124 |
-
language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 125 |
-
language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 126 |
-
language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 127 |
-
language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 128 |
-
language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 129 |
-
language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 130 |
-
language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 131 |
-
language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 132 |
-
language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 133 |
-
language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 134 |
-
language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 135 |
-
language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 136 |
-
language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 137 |
-
language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 138 |
-
language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 139 |
-
language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 140 |
-
language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 141 |
-
time_embedder._fsdp_wrapped_module._flat_param True
|
| 142 |
-
latent_pos_embed._fsdp_wrapped_module._flat_param False
|
| 143 |
-
vit_model.vision_model._fsdp_wrapped_module._flat_param True
|
| 144 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 145 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 146 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 147 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 148 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 149 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 150 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 151 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 152 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 153 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 154 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 155 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 156 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 157 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 158 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 159 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 160 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 161 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 162 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 163 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 164 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 165 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 166 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 167 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 168 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 169 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 170 |
-
connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 171 |
-
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 172 |
-
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 173 |
-
base_dir is results/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step0
|
| 174 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 175 |
-
[eval debug] first 3 batch fingerprints:
|
| 176 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 177 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 178 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 179 |
-
ce_avg: 0.40467292070388794, mse_avg: 0.06533115357160568
|
| 180 |
-
base_dir is results/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step500
|
| 181 |
wandb: Detected [huggingface_hub.inference] in use.
|
| 182 |
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
|
| 183 |
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
@@ -719,4 +539,184 @@ AssertionError
|
|
| 719 |
[rank0]: File "/home/clouduser/Code/Github/unified_world_model/data/dataset_base.py", line 115, in build_datasets
|
| 720 |
[rank0]: assert 'dataset_names' in dataset_args.keys()
|
| 721 |
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 722 |
-
[rank0]: AssertionError
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
wandb: Detected [huggingface_hub.inference] in use.
|
| 2 |
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
|
| 3 |
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
|
|
| 539 |
[rank0]: File "/home/clouduser/Code/Github/unified_world_model/data/dataset_base.py", line 115, in build_datasets
|
| 540 |
[rank0]: assert 'dataset_names' in dataset_args.keys()
|
| 541 |
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 542 |
+
[rank0]: AssertionError
|
| 543 |
+
FullyShardedDataParallel(
|
| 544 |
+
(_fsdp_wrapped_module): Bagel(
|
| 545 |
+
(language_model): Qwen2ForCausalLM(
|
| 546 |
+
(model): Qwen2Model(
|
| 547 |
+
(embed_tokens): Embedding(152064, 3584)
|
| 548 |
+
(layers): ModuleList(
|
| 549 |
+
(0-27): 28 x FullyShardedDataParallel(
|
| 550 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 551 |
+
(_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
|
| 552 |
+
(self_attn): PackedAttentionMoT(
|
| 553 |
+
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
|
| 554 |
+
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 555 |
+
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 556 |
+
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
|
| 557 |
+
(q_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 558 |
+
(k_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 559 |
+
(q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 560 |
+
(k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 561 |
+
(q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
|
| 562 |
+
(k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 563 |
+
(v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 564 |
+
(o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
|
| 565 |
+
)
|
| 566 |
+
(mlp): Qwen2MLP(
|
| 567 |
+
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 568 |
+
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 569 |
+
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 570 |
+
(act_fn): SiLU()
|
| 571 |
+
)
|
| 572 |
+
(mlp_moe_gen): Qwen2MLP(
|
| 573 |
+
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 574 |
+
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 575 |
+
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 576 |
+
(act_fn): SiLU()
|
| 577 |
+
)
|
| 578 |
+
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 579 |
+
(input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 580 |
+
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 581 |
+
(post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 582 |
+
)
|
| 583 |
+
)
|
| 584 |
+
)
|
| 585 |
+
)
|
| 586 |
+
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 587 |
+
(norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 588 |
+
(rotary_emb): Qwen2RotaryEmbedding()
|
| 589 |
+
)
|
| 590 |
+
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
|
| 591 |
+
)
|
| 592 |
+
(time_embedder): FullyShardedDataParallel(
|
| 593 |
+
(_fsdp_wrapped_module): TimestepEmbedder(
|
| 594 |
+
(mlp): Sequential(
|
| 595 |
+
(0): Linear(in_features=256, out_features=3584, bias=True)
|
| 596 |
+
(1): SiLU()
|
| 597 |
+
(2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 598 |
+
)
|
| 599 |
+
)
|
| 600 |
+
)
|
| 601 |
+
(vae2llm): Linear(in_features=64, out_features=3584, bias=True)
|
| 602 |
+
(llm2vae): Linear(in_features=3584, out_features=64, bias=True)
|
| 603 |
+
(latent_pos_embed): FullyShardedDataParallel(
|
| 604 |
+
(_fsdp_wrapped_module): PositionEmbedding()
|
| 605 |
+
)
|
| 606 |
+
(vit_model): SiglipVisionModel(
|
| 607 |
+
(vision_model): FullyShardedDataParallel(
|
| 608 |
+
(_fsdp_wrapped_module): SiglipVisionTransformer(
|
| 609 |
+
(embeddings): SiglipVisionEmbeddings(
|
| 610 |
+
(position_embedding): Embedding(4900, 1152)
|
| 611 |
+
(patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
|
| 612 |
+
)
|
| 613 |
+
(encoder): SiglipEncoder(
|
| 614 |
+
(layers): ModuleList(
|
| 615 |
+
(0-25): 26 x FullyShardedDataParallel(
|
| 616 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 617 |
+
(_checkpoint_wrapped_module): SiglipEncoderLayer(
|
| 618 |
+
(self_attn): SiglipFlashAttention2(
|
| 619 |
+
(k_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 620 |
+
(v_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 621 |
+
(q_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 622 |
+
(out_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 623 |
+
)
|
| 624 |
+
(layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 625 |
+
(mlp): SiglipMLP(
|
| 626 |
+
(activation_fn): PytorchGELUTanh()
|
| 627 |
+
(fc1): Linear(in_features=1152, out_features=4304, bias=True)
|
| 628 |
+
(fc2): Linear(in_features=4304, out_features=1152, bias=True)
|
| 629 |
+
)
|
| 630 |
+
(layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 631 |
+
)
|
| 632 |
+
)
|
| 633 |
+
)
|
| 634 |
+
)
|
| 635 |
+
)
|
| 636 |
+
(post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 637 |
+
)
|
| 638 |
+
)
|
| 639 |
+
)
|
| 640 |
+
(connector): FullyShardedDataParallel(
|
| 641 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 642 |
+
(_checkpoint_wrapped_module): MLPconnector(
|
| 643 |
+
(activation_fn): PytorchGELUTanh()
|
| 644 |
+
(fc1): Linear(in_features=1152, out_features=3584, bias=True)
|
| 645 |
+
(fc2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 646 |
+
)
|
| 647 |
+
)
|
| 648 |
+
)
|
| 649 |
+
(vit_pos_embed): FullyShardedDataParallel(
|
| 650 |
+
(_fsdp_wrapped_module): PositionEmbedding()
|
| 651 |
+
)
|
| 652 |
+
)
|
| 653 |
+
)
|
| 654 |
+
_flat_param True
|
| 655 |
+
language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 656 |
+
language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 657 |
+
language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 658 |
+
language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 659 |
+
language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 660 |
+
language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 661 |
+
language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 662 |
+
language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 663 |
+
language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 664 |
+
language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 665 |
+
language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 666 |
+
language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 667 |
+
language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 668 |
+
language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 669 |
+
language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 670 |
+
language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 671 |
+
language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 672 |
+
language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 673 |
+
language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 674 |
+
language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 675 |
+
language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 676 |
+
language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 677 |
+
language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 678 |
+
language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 679 |
+
language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 680 |
+
language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 681 |
+
language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 682 |
+
language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 683 |
+
time_embedder._fsdp_wrapped_module._flat_param True
|
| 684 |
+
latent_pos_embed._fsdp_wrapped_module._flat_param False
|
| 685 |
+
vit_model.vision_model._fsdp_wrapped_module._flat_param True
|
| 686 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 687 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 688 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 689 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 690 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 691 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 692 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 693 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 694 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 695 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 696 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 697 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 698 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 699 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 700 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 701 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 702 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 703 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 704 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 705 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 706 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 707 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 708 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 709 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 710 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 711 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 712 |
+
connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 713 |
+
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 714 |
+
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 715 |
+
base_dir is results/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step0
|
| 716 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 717 |
+
[eval debug] first 3 batch fingerprints:
|
| 718 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 719 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 720 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 721 |
+
ce_avg: 0.40467292070388794, mse_avg: 0.06533115357160568
|
| 722 |
+
base_dir is results/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step500
|
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260105_090343-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log
CHANGED
|
@@ -849,6 +849,12 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 849 |
[[34m2026-01-05 12:24:55[39m] (step=0000838) Train Loss mse: 0.0370, Train Loss ce: 0.0676, Train Steps/Sec: 0.06,
|
| 850 |
[[34m2026-01-05 12:25:08[39m] (step=0000839) Train Loss mse: 0.0387, Train Loss ce: 0.0647, Train Steps/Sec: 0.07,
|
| 851 |
[[34m2026-01-05 12:25:24[39m] (step=0000840) Train Loss mse: 0.0463, Train Loss ce: 0.0704, Train Steps/Sec: 0.06,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 852 |
FullyShardedDataParallel(
|
| 853 |
(_fsdp_wrapped_module): Bagel(
|
| 854 |
(language_model): Qwen2ForCausalLM(
|
|
@@ -1035,12 +1041,20 @@ Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
|
| 1035 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1036 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1037 |
ce_avg: 0.07113238424062729, mse_avg: 0.043496306985616684
|
| 1038 |
-
|
| 1039 |
-
|
| 1040 |
-
[
|
| 1041 |
-
[
|
| 1042 |
-
[
|
| 1043 |
-
[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1044 |
[[34m2026-01-05 12:27:02[39m] (step=0000847) Train Loss mse: 0.0251, Train Loss ce: 0.0612, Train Steps/Sec: 0.06,
|
| 1045 |
[[34m2026-01-05 12:27:15[39m] (step=0000848) Train Loss mse: 0.0327, Train Loss ce: 0.0664, Train Steps/Sec: 0.08,
|
| 1046 |
[[34m2026-01-05 12:27:31[39m] (step=0000849) Train Loss mse: 0.0508, Train Loss ce: 0.0639, Train Steps/Sec: 0.06,
|
|
@@ -2179,18 +2193,4 @@ ce_avg: 0.07113238424062729, mse_avg: 0.043496306985616684
|
|
| 2179 |
[[34m2026-01-05 16:48:49[39m] (step=0001982) Train Loss mse: 0.0501, Train Loss ce: 0.0636, Train Steps/Sec: 0.09,
|
| 2180 |
[[34m2026-01-05 16:49:02[39m] (step=0001983) Train Loss mse: 0.0559, Train Loss ce: 0.0639, Train Steps/Sec: 0.08,
|
| 2181 |
[[34m2026-01-05 16:49:14[39m] (step=0001984) Train Loss mse: 0.0539, Train Loss ce: 0.0606, Train Steps/Sec: 0.08,
|
| 2182 |
-
[[34m2026-01-05 16:49:30[39m] (step=0001985) Train Loss mse: 0.0312, Train Loss ce: 0.0595, Train Steps/Sec: 0.06,
|
| 2183 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1000
|
| 2184 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 2185 |
-
[eval debug] first 3 batch fingerprints:
|
| 2186 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2187 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2188 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2189 |
-
ce_avg: 0.09860303997993469, mse_avg: 0.04136444255709648
|
| 2190 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1500
|
| 2191 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 2192 |
-
[eval debug] first 3 batch fingerprints:
|
| 2193 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2194 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2195 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2196 |
-
ce_avg: 0.13110212981700897, mse_avg: 0.041821885854005814
|
|
|
|
| 849 |
[[34m2026-01-05 12:24:55[39m] (step=0000838) Train Loss mse: 0.0370, Train Loss ce: 0.0676, Train Steps/Sec: 0.06,
|
| 850 |
[[34m2026-01-05 12:25:08[39m] (step=0000839) Train Loss mse: 0.0387, Train Loss ce: 0.0647, Train Steps/Sec: 0.07,
|
| 851 |
[[34m2026-01-05 12:25:24[39m] (step=0000840) Train Loss mse: 0.0463, Train Loss ce: 0.0704, Train Steps/Sec: 0.06,
|
| 852 |
+
[[34m2026-01-05 12:25:37[39m] (step=0000841) Train Loss mse: 0.0448, Train Loss ce: 0.0663, Train Steps/Sec: 0.08,
|
| 853 |
+
[[34m2026-01-05 12:25:50[39m] (step=0000842) Train Loss mse: 0.0398, Train Loss ce: 0.0621, Train Steps/Sec: 0.08,
|
| 854 |
+
[[34m2026-01-05 12:26:06[39m] (step=0000843) Train Loss mse: 0.0387, Train Loss ce: 0.0604, Train Steps/Sec: 0.06,
|
| 855 |
+
[[34m2026-01-05 12:26:20[39m] (step=0000844) Train Loss mse: 0.0499, Train Loss ce: 0.0661, Train Steps/Sec: 0.07,
|
| 856 |
+
[[34m2026-01-05 12:26:34[39m] (step=0000845) Train Loss mse: 0.0324, Train Loss ce: 0.0699, Train Steps/Sec: 0.07,
|
| 857 |
+
[[34m2026-01-05 12:26:46[39m] (step=0000846) Train Loss mse: 0.0344, Train Loss ce: 0.0596, Train Steps/Sec: 0.08,
|
| 858 |
FullyShardedDataParallel(
|
| 859 |
(_fsdp_wrapped_module): Bagel(
|
| 860 |
(language_model): Qwen2ForCausalLM(
|
|
|
|
| 1041 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1042 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1043 |
ce_avg: 0.07113238424062729, mse_avg: 0.043496306985616684
|
| 1044 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1000
|
| 1045 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1046 |
+
[eval debug] first 3 batch fingerprints:
|
| 1047 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1048 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1049 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1050 |
+
ce_avg: 0.09860303997993469, mse_avg: 0.04136444255709648
|
| 1051 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1500
|
| 1052 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1053 |
+
[eval debug] first 3 batch fingerprints:
|
| 1054 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1055 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1056 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1057 |
+
ce_avg: 0.13110212981700897, mse_avg: 0.041821885854005814
|
| 1058 |
[[34m2026-01-05 12:27:02[39m] (step=0000847) Train Loss mse: 0.0251, Train Loss ce: 0.0612, Train Steps/Sec: 0.06,
|
| 1059 |
[[34m2026-01-05 12:27:15[39m] (step=0000848) Train Loss mse: 0.0327, Train Loss ce: 0.0664, Train Steps/Sec: 0.08,
|
| 1060 |
[[34m2026-01-05 12:27:31[39m] (step=0000849) Train Loss mse: 0.0508, Train Loss ce: 0.0639, Train Steps/Sec: 0.06,
|
|
|
|
| 2193 |
[[34m2026-01-05 16:48:49[39m] (step=0001982) Train Loss mse: 0.0501, Train Loss ce: 0.0636, Train Steps/Sec: 0.09,
|
| 2194 |
[[34m2026-01-05 16:49:02[39m] (step=0001983) Train Loss mse: 0.0559, Train Loss ce: 0.0639, Train Steps/Sec: 0.08,
|
| 2195 |
[[34m2026-01-05 16:49:14[39m] (step=0001984) Train Loss mse: 0.0539, Train Loss ce: 0.0606, Train Steps/Sec: 0.08,
|
| 2196 |
+
[[34m2026-01-05 16:49:30[39m] (step=0001985) Train Loss mse: 0.0312, Train Loss ce: 0.0595, Train Steps/Sec: 0.06,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260107_093905-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log
CHANGED
|
@@ -1,189 +1,3 @@
|
|
| 1 |
-
FullyShardedDataParallel(
|
| 2 |
-
(_fsdp_wrapped_module): Bagel(
|
| 3 |
-
(language_model): Qwen2ForCausalLM(
|
| 4 |
-
(model): Qwen2Model(
|
| 5 |
-
(embed_tokens): Embedding(152064, 3584)
|
| 6 |
-
(layers): ModuleList(
|
| 7 |
-
(0-27): 28 x FullyShardedDataParallel(
|
| 8 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 9 |
-
(_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
|
| 10 |
-
(self_attn): PackedAttentionMoT(
|
| 11 |
-
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
|
| 12 |
-
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 13 |
-
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 14 |
-
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
|
| 15 |
-
(q_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 16 |
-
(k_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 17 |
-
(q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 18 |
-
(k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 19 |
-
(q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
|
| 20 |
-
(k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 21 |
-
(v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 22 |
-
(o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
|
| 23 |
-
)
|
| 24 |
-
(mlp): Qwen2MLP(
|
| 25 |
-
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 26 |
-
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 27 |
-
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 28 |
-
(act_fn): SiLU()
|
| 29 |
-
)
|
| 30 |
-
(mlp_moe_gen): Qwen2MLP(
|
| 31 |
-
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 32 |
-
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 33 |
-
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 34 |
-
(act_fn): SiLU()
|
| 35 |
-
)
|
| 36 |
-
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 37 |
-
(input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 38 |
-
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 39 |
-
(post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 40 |
-
)
|
| 41 |
-
)
|
| 42 |
-
)
|
| 43 |
-
)
|
| 44 |
-
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 45 |
-
(norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 46 |
-
(rotary_emb): Qwen2RotaryEmbedding()
|
| 47 |
-
)
|
| 48 |
-
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
|
| 49 |
-
)
|
| 50 |
-
(time_embedder): FullyShardedDataParallel(
|
| 51 |
-
(_fsdp_wrapped_module): TimestepEmbedder(
|
| 52 |
-
(mlp): Sequential(
|
| 53 |
-
(0): Linear(in_features=256, out_features=3584, bias=True)
|
| 54 |
-
(1): SiLU()
|
| 55 |
-
(2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 56 |
-
)
|
| 57 |
-
)
|
| 58 |
-
)
|
| 59 |
-
(vae2llm): Linear(in_features=64, out_features=3584, bias=True)
|
| 60 |
-
(llm2vae): Linear(in_features=3584, out_features=64, bias=True)
|
| 61 |
-
(latent_pos_embed): FullyShardedDataParallel(
|
| 62 |
-
(_fsdp_wrapped_module): PositionEmbedding()
|
| 63 |
-
)
|
| 64 |
-
(vit_model): SiglipVisionModel(
|
| 65 |
-
(vision_model): FullyShardedDataParallel(
|
| 66 |
-
(_fsdp_wrapped_module): SiglipVisionTransformer(
|
| 67 |
-
(embeddings): SiglipVisionEmbeddings(
|
| 68 |
-
(position_embedding): Embedding(4900, 1152)
|
| 69 |
-
(patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
|
| 70 |
-
)
|
| 71 |
-
(encoder): SiglipEncoder(
|
| 72 |
-
(layers): ModuleList(
|
| 73 |
-
(0-25): 26 x FullyShardedDataParallel(
|
| 74 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 75 |
-
(_checkpoint_wrapped_module): SiglipEncoderLayer(
|
| 76 |
-
(self_attn): SiglipFlashAttention2(
|
| 77 |
-
(k_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 78 |
-
(v_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 79 |
-
(q_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 80 |
-
(out_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 81 |
-
)
|
| 82 |
-
(layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 83 |
-
(mlp): SiglipMLP(
|
| 84 |
-
(activation_fn): PytorchGELUTanh()
|
| 85 |
-
(fc1): Linear(in_features=1152, out_features=4304, bias=True)
|
| 86 |
-
(fc2): Linear(in_features=4304, out_features=1152, bias=True)
|
| 87 |
-
)
|
| 88 |
-
(layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 89 |
-
)
|
| 90 |
-
)
|
| 91 |
-
)
|
| 92 |
-
)
|
| 93 |
-
)
|
| 94 |
-
(post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 95 |
-
)
|
| 96 |
-
)
|
| 97 |
-
)
|
| 98 |
-
(connector): FullyShardedDataParallel(
|
| 99 |
-
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 100 |
-
(_checkpoint_wrapped_module): MLPconnector(
|
| 101 |
-
(activation_fn): PytorchGELUTanh()
|
| 102 |
-
(fc1): Linear(in_features=1152, out_features=3584, bias=True)
|
| 103 |
-
(fc2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 104 |
-
)
|
| 105 |
-
)
|
| 106 |
-
)
|
| 107 |
-
(vit_pos_embed): FullyShardedDataParallel(
|
| 108 |
-
(_fsdp_wrapped_module): PositionEmbedding()
|
| 109 |
-
)
|
| 110 |
-
)
|
| 111 |
-
)
|
| 112 |
-
_flat_param True
|
| 113 |
-
language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 114 |
-
language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 115 |
-
language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 116 |
-
language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 117 |
-
language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 118 |
-
language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 119 |
-
language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 120 |
-
language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 121 |
-
language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 122 |
-
language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 123 |
-
language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 124 |
-
language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 125 |
-
language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 126 |
-
language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 127 |
-
language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 128 |
-
language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 129 |
-
language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 130 |
-
language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 131 |
-
language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 132 |
-
language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 133 |
-
language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 134 |
-
language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 135 |
-
language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 136 |
-
language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 137 |
-
language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 138 |
-
language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 139 |
-
language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 140 |
-
language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 141 |
-
time_embedder._fsdp_wrapped_module._flat_param True
|
| 142 |
-
latent_pos_embed._fsdp_wrapped_module._flat_param False
|
| 143 |
-
vit_model.vision_model._fsdp_wrapped_module._flat_param True
|
| 144 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 145 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 146 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 147 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 148 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 149 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 150 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 151 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 152 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 153 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 154 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 155 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 156 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 157 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 158 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 159 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 160 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 161 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 162 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 163 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 164 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 165 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 166 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 167 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 168 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 169 |
-
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 170 |
-
connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 171 |
-
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 172 |
-
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 173 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step0
|
| 174 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 175 |
-
[eval debug] first 3 batch fingerprints:
|
| 176 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 177 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 178 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 179 |
-
ce_avg: 0.40467292070388794, mse_avg: 0.06533115357160568
|
| 180 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step500
|
| 181 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 182 |
-
[eval debug] first 3 batch fingerprints:
|
| 183 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 184 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 185 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 186 |
-
ce_avg: 0.07139278203248978, mse_avg: 0.04338355362415314
|
| 187 |
wandb: Detected [huggingface_hub.inference] in use.
|
| 188 |
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
|
| 189 |
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
@@ -1014,6 +828,206 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 1014 |
[[34m2026-01-07 12:55:35[39m] (step=0000817) Train Loss mse: 0.0457, Train Loss ce: 0.0721, Train Steps/Sec: 0.06,
|
| 1015 |
[[34m2026-01-07 12:55:48[39m] (step=0000818) Train Loss mse: 0.0351, Train Loss ce: 0.0605, Train Steps/Sec: 0.08,
|
| 1016 |
[[34m2026-01-07 12:55:59[39m] (step=0000819) Train Loss mse: 0.0485, Train Loss ce: 0.0618, Train Steps/Sec: 0.09,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1017 |
[[34m2026-01-07 12:56:10[39m] (step=0000820) Train Loss mse: 0.0426, Train Loss ce: 0.0636, Train Steps/Sec: 0.09,
|
| 1018 |
[[34m2026-01-07 12:56:23[39m] (step=0000821) Train Loss mse: 0.0290, Train Loss ce: 0.0615, Train Steps/Sec: 0.07,
|
| 1019 |
[[34m2026-01-07 12:56:34[39m] (step=0000822) Train Loss mse: 0.0403, Train Loss ce: 0.0646, Train Steps/Sec: 0.09,
|
|
@@ -1057,20 +1071,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 1057 |
[[34m2026-01-07 13:05:13[39m] (step=0000860) Train Loss mse: 0.0359, Train Loss ce: 0.0613, Train Steps/Sec: 0.08,
|
| 1058 |
[[34m2026-01-07 13:05:29[39m] (step=0000861) Train Loss mse: 0.0381, Train Loss ce: 0.0595, Train Steps/Sec: 0.06,
|
| 1059 |
[[34m2026-01-07 13:05:42[39m] (step=0000862) Train Loss mse: 0.0404, Train Loss ce: 0.0619, Train Steps/Sec: 0.07,
|
| 1060 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1000
|
| 1061 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1062 |
-
[eval debug] first 3 batch fingerprints:
|
| 1063 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1064 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1065 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1066 |
-
ce_avg: 0.09475355595350266, mse_avg: 0.04140673205256462
|
| 1067 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1500
|
| 1068 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1069 |
-
[eval debug] first 3 batch fingerprints:
|
| 1070 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1071 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1072 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1073 |
-
ce_avg: 0.1198282465338707, mse_avg: 0.04170006886124611
|
| 1074 |
[[34m2026-01-07 13:05:55[39m] (step=0000863) Train Loss mse: 0.0506, Train Loss ce: 0.0734, Train Steps/Sec: 0.08,
|
| 1075 |
[[34m2026-01-07 13:06:08[39m] (step=0000864) Train Loss mse: 0.0548, Train Loss ce: 0.0637, Train Steps/Sec: 0.07,
|
| 1076 |
[[34m2026-01-07 13:06:22[39m] (step=0000865) Train Loss mse: 0.0454, Train Loss ce: 0.0644, Train Steps/Sec: 0.07,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
wandb: Detected [huggingface_hub.inference] in use.
|
| 2 |
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
|
| 3 |
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
|
|
| 828 |
[[34m2026-01-07 12:55:35[39m] (step=0000817) Train Loss mse: 0.0457, Train Loss ce: 0.0721, Train Steps/Sec: 0.06,
|
| 829 |
[[34m2026-01-07 12:55:48[39m] (step=0000818) Train Loss mse: 0.0351, Train Loss ce: 0.0605, Train Steps/Sec: 0.08,
|
| 830 |
[[34m2026-01-07 12:55:59[39m] (step=0000819) Train Loss mse: 0.0485, Train Loss ce: 0.0618, Train Steps/Sec: 0.09,
|
| 831 |
+
FullyShardedDataParallel(
|
| 832 |
+
(_fsdp_wrapped_module): Bagel(
|
| 833 |
+
(language_model): Qwen2ForCausalLM(
|
| 834 |
+
(model): Qwen2Model(
|
| 835 |
+
(embed_tokens): Embedding(152064, 3584)
|
| 836 |
+
(layers): ModuleList(
|
| 837 |
+
(0-27): 28 x FullyShardedDataParallel(
|
| 838 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 839 |
+
(_checkpoint_wrapped_module): Qwen2MoTDecoderLayer(
|
| 840 |
+
(self_attn): PackedAttentionMoT(
|
| 841 |
+
(q_proj): Linear(in_features=3584, out_features=3584, bias=True)
|
| 842 |
+
(k_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 843 |
+
(v_proj): Linear(in_features=3584, out_features=512, bias=True)
|
| 844 |
+
(o_proj): Linear(in_features=3584, out_features=3584, bias=False)
|
| 845 |
+
(q_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 846 |
+
(k_norm): Qwen2RMSNorm((128,), eps=1e-06)
|
| 847 |
+
(q_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 848 |
+
(k_norm_moe_gen): Qwen2RMSNorm((128,), eps=1e-06)
|
| 849 |
+
(q_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=True)
|
| 850 |
+
(k_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 851 |
+
(v_proj_moe_gen): Linear(in_features=3584, out_features=512, bias=True)
|
| 852 |
+
(o_proj_moe_gen): Linear(in_features=3584, out_features=3584, bias=False)
|
| 853 |
+
)
|
| 854 |
+
(mlp): Qwen2MLP(
|
| 855 |
+
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 856 |
+
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 857 |
+
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 858 |
+
(act_fn): SiLU()
|
| 859 |
+
)
|
| 860 |
+
(mlp_moe_gen): Qwen2MLP(
|
| 861 |
+
(gate_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 862 |
+
(up_proj): Linear(in_features=3584, out_features=18944, bias=False)
|
| 863 |
+
(down_proj): Linear(in_features=18944, out_features=3584, bias=False)
|
| 864 |
+
(act_fn): SiLU()
|
| 865 |
+
)
|
| 866 |
+
(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 867 |
+
(input_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 868 |
+
(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 869 |
+
(post_attention_layernorm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 870 |
+
)
|
| 871 |
+
)
|
| 872 |
+
)
|
| 873 |
+
)
|
| 874 |
+
(norm): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 875 |
+
(norm_moe_gen): Qwen2RMSNorm((3584,), eps=1e-06)
|
| 876 |
+
(rotary_emb): Qwen2RotaryEmbedding()
|
| 877 |
+
)
|
| 878 |
+
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
|
| 879 |
+
)
|
| 880 |
+
(time_embedder): FullyShardedDataParallel(
|
| 881 |
+
(_fsdp_wrapped_module): TimestepEmbedder(
|
| 882 |
+
(mlp): Sequential(
|
| 883 |
+
(0): Linear(in_features=256, out_features=3584, bias=True)
|
| 884 |
+
(1): SiLU()
|
| 885 |
+
(2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 886 |
+
)
|
| 887 |
+
)
|
| 888 |
+
)
|
| 889 |
+
(vae2llm): Linear(in_features=64, out_features=3584, bias=True)
|
| 890 |
+
(llm2vae): Linear(in_features=3584, out_features=64, bias=True)
|
| 891 |
+
(latent_pos_embed): FullyShardedDataParallel(
|
| 892 |
+
(_fsdp_wrapped_module): PositionEmbedding()
|
| 893 |
+
)
|
| 894 |
+
(vit_model): SiglipVisionModel(
|
| 895 |
+
(vision_model): FullyShardedDataParallel(
|
| 896 |
+
(_fsdp_wrapped_module): SiglipVisionTransformer(
|
| 897 |
+
(embeddings): SiglipVisionEmbeddings(
|
| 898 |
+
(position_embedding): Embedding(4900, 1152)
|
| 899 |
+
(patch_embedding): Linear(in_features=588, out_features=1152, bias=True)
|
| 900 |
+
)
|
| 901 |
+
(encoder): SiglipEncoder(
|
| 902 |
+
(layers): ModuleList(
|
| 903 |
+
(0-25): 26 x FullyShardedDataParallel(
|
| 904 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 905 |
+
(_checkpoint_wrapped_module): SiglipEncoderLayer(
|
| 906 |
+
(self_attn): SiglipFlashAttention2(
|
| 907 |
+
(k_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 908 |
+
(v_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 909 |
+
(q_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 910 |
+
(out_proj): Linear(in_features=1152, out_features=1152, bias=True)
|
| 911 |
+
)
|
| 912 |
+
(layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 913 |
+
(mlp): SiglipMLP(
|
| 914 |
+
(activation_fn): PytorchGELUTanh()
|
| 915 |
+
(fc1): Linear(in_features=1152, out_features=4304, bias=True)
|
| 916 |
+
(fc2): Linear(in_features=4304, out_features=1152, bias=True)
|
| 917 |
+
)
|
| 918 |
+
(layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 919 |
+
)
|
| 920 |
+
)
|
| 921 |
+
)
|
| 922 |
+
)
|
| 923 |
+
)
|
| 924 |
+
(post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
|
| 925 |
+
)
|
| 926 |
+
)
|
| 927 |
+
)
|
| 928 |
+
(connector): FullyShardedDataParallel(
|
| 929 |
+
(_fsdp_wrapped_module): CheckpointWrapper(
|
| 930 |
+
(_checkpoint_wrapped_module): MLPconnector(
|
| 931 |
+
(activation_fn): PytorchGELUTanh()
|
| 932 |
+
(fc1): Linear(in_features=1152, out_features=3584, bias=True)
|
| 933 |
+
(fc2): Linear(in_features=3584, out_features=3584, bias=True)
|
| 934 |
+
)
|
| 935 |
+
)
|
| 936 |
+
)
|
| 937 |
+
(vit_pos_embed): FullyShardedDataParallel(
|
| 938 |
+
(_fsdp_wrapped_module): PositionEmbedding()
|
| 939 |
+
)
|
| 940 |
+
)
|
| 941 |
+
)
|
| 942 |
+
_flat_param True
|
| 943 |
+
language_model.model.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 944 |
+
language_model.model.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 945 |
+
language_model.model.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 946 |
+
language_model.model.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 947 |
+
language_model.model.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 948 |
+
language_model.model.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 949 |
+
language_model.model.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 950 |
+
language_model.model.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 951 |
+
language_model.model.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 952 |
+
language_model.model.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 953 |
+
language_model.model.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 954 |
+
language_model.model.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 955 |
+
language_model.model.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 956 |
+
language_model.model.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 957 |
+
language_model.model.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 958 |
+
language_model.model.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 959 |
+
language_model.model.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 960 |
+
language_model.model.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 961 |
+
language_model.model.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 962 |
+
language_model.model.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 963 |
+
language_model.model.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 964 |
+
language_model.model.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 965 |
+
language_model.model.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 966 |
+
language_model.model.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 967 |
+
language_model.model.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 968 |
+
language_model.model.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 969 |
+
language_model.model.layers.26._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 970 |
+
language_model.model.layers.27._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 971 |
+
time_embedder._fsdp_wrapped_module._flat_param True
|
| 972 |
+
latent_pos_embed._fsdp_wrapped_module._flat_param False
|
| 973 |
+
vit_model.vision_model._fsdp_wrapped_module._flat_param True
|
| 974 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.0._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 975 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.1._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 976 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.2._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 977 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.3._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 978 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.4._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 979 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.5._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 980 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.6._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 981 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.7._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 982 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.8._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 983 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.9._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 984 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.10._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 985 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.11._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 986 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.12._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 987 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.13._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 988 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.14._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 989 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.15._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 990 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.16._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 991 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.17._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 992 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.18._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 993 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.19._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 994 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.20._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 995 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.21._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 996 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.22._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 997 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.23._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 998 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.24._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 999 |
+
vit_model.vision_model._fsdp_wrapped_module.encoder.layers.25._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1000 |
+
connector._fsdp_wrapped_module._checkpoint_wrapped_module._flat_param True
|
| 1001 |
+
vit_pos_embed._fsdp_wrapped_module._flat_param False
|
| 1002 |
+
Preparing Dataset vlm_gym_jigsaw_celoss/vlm_gym_jigsaw_train
|
| 1003 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step0
|
| 1004 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1005 |
+
[eval debug] first 3 batch fingerprints:
|
| 1006 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1007 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1008 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1009 |
+
ce_avg: 0.40467292070388794, mse_avg: 0.06533115357160568
|
| 1010 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step500
|
| 1011 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1012 |
+
[eval debug] first 3 batch fingerprints:
|
| 1013 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1014 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1015 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1016 |
+
ce_avg: 0.07139278203248978, mse_avg: 0.04338355362415314
|
| 1017 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1000
|
| 1018 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1019 |
+
[eval debug] first 3 batch fingerprints:
|
| 1020 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1021 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1022 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1023 |
+
ce_avg: 0.09475355595350266, mse_avg: 0.04140673205256462
|
| 1024 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step1500
|
| 1025 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 1026 |
+
[eval debug] first 3 batch fingerprints:
|
| 1027 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1028 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1029 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1030 |
+
ce_avg: 0.1198282465338707, mse_avg: 0.04170006886124611
|
| 1031 |
[[34m2026-01-07 12:56:10[39m] (step=0000820) Train Loss mse: 0.0426, Train Loss ce: 0.0636, Train Steps/Sec: 0.09,
|
| 1032 |
[[34m2026-01-07 12:56:23[39m] (step=0000821) Train Loss mse: 0.0290, Train Loss ce: 0.0615, Train Steps/Sec: 0.07,
|
| 1033 |
[[34m2026-01-07 12:56:34[39m] (step=0000822) Train Loss mse: 0.0403, Train Loss ce: 0.0646, Train Steps/Sec: 0.09,
|
|
|
|
| 1071 |
[[34m2026-01-07 13:05:13[39m] (step=0000860) Train Loss mse: 0.0359, Train Loss ce: 0.0613, Train Steps/Sec: 0.08,
|
| 1072 |
[[34m2026-01-07 13:05:29[39m] (step=0000861) Train Loss mse: 0.0381, Train Loss ce: 0.0595, Train Steps/Sec: 0.06,
|
| 1073 |
[[34m2026-01-07 13:05:42[39m] (step=0000862) Train Loss mse: 0.0404, Train Loss ce: 0.0619, Train Steps/Sec: 0.07,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1074 |
[[34m2026-01-07 13:05:55[39m] (step=0000863) Train Loss mse: 0.0506, Train Loss ce: 0.0734, Train Steps/Sec: 0.08,
|
| 1075 |
[[34m2026-01-07 13:06:08[39m] (step=0000864) Train Loss mse: 0.0548, Train Loss ce: 0.0637, Train Steps/Sec: 0.07,
|
| 1076 |
[[34m2026-01-07 13:06:22[39m] (step=0000865) Train Loss mse: 0.0454, Train Loss ce: 0.0644, Train Steps/Sec: 0.07,
|
checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/wandb/offline-run-20260107_184933-checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed-run0/files/output.log
CHANGED
|
@@ -817,63 +817,6 @@ wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
|
|
| 817 |
[[34m2026-01-07 22:03:25[39m] (step=0000806) Train Loss mse: 0.0474, Train Loss ce: 0.0616, Train Steps/Sec: 0.08,
|
| 818 |
[[34m2026-01-07 22:03:37[39m] (step=0000807) Train Loss mse: 0.0525, Train Loss ce: 0.0598, Train Steps/Sec: 0.08,
|
| 819 |
[[34m2026-01-07 22:03:49[39m] (step=0000808) Train Loss mse: 0.0454, Train Loss ce: 0.0603, Train Steps/Sec: 0.09,
|
| 820 |
-
[[34m2026-01-07 22:04:05[39m] (step=0000809) Train Loss mse: 0.0326, Train Loss ce: 0.0647, Train Steps/Sec: 0.06,
|
| 821 |
-
[[34m2026-01-07 22:04:17[39m] (step=0000810) Train Loss mse: 0.0437, Train Loss ce: 0.0584, Train Steps/Sec: 0.08,
|
| 822 |
-
[[34m2026-01-07 22:04:29[39m] (step=0000811) Train Loss mse: 0.0597, Train Loss ce: 0.0633, Train Steps/Sec: 0.08,
|
| 823 |
-
[[34m2026-01-07 22:04:41[39m] (step=0000812) Train Loss mse: 0.0387, Train Loss ce: 0.0599, Train Steps/Sec: 0.09,
|
| 824 |
-
[[34m2026-01-07 22:04:55[39m] (step=0000813) Train Loss mse: 0.0491, Train Loss ce: 0.0613, Train Steps/Sec: 0.07,
|
| 825 |
-
[[34m2026-01-07 22:05:08[39m] (step=0000814) Train Loss mse: 0.0546, Train Loss ce: 0.0671, Train Steps/Sec: 0.08,
|
| 826 |
-
[[34m2026-01-07 22:05:24[39m] (step=0000815) Train Loss mse: 0.0472, Train Loss ce: 0.0726, Train Steps/Sec: 0.06,
|
| 827 |
-
[[34m2026-01-07 22:05:41[39m] (step=0000816) Train Loss mse: 0.0341, Train Loss ce: 0.0607, Train Steps/Sec: 0.06,
|
| 828 |
-
[[34m2026-01-07 22:05:56[39m] (step=0000817) Train Loss mse: 0.0452, Train Loss ce: 0.0611, Train Steps/Sec: 0.06,
|
| 829 |
-
[[34m2026-01-07 22:06:10[39m] (step=0000818) Train Loss mse: 0.0349, Train Loss ce: 0.0640, Train Steps/Sec: 0.08,
|
| 830 |
-
[[34m2026-01-07 22:06:21[39m] (step=0000819) Train Loss mse: 0.0482, Train Loss ce: 0.0636, Train Steps/Sec: 0.09,
|
| 831 |
-
[[34m2026-01-07 22:06:32[39m] (step=0000820) Train Loss mse: 0.0424, Train Loss ce: 0.0601, Train Steps/Sec: 0.09,
|
| 832 |
-
[[34m2026-01-07 22:06:45[39m] (step=0000821) Train Loss mse: 0.0306, Train Loss ce: 0.0608, Train Steps/Sec: 0.08,
|
| 833 |
-
[[34m2026-01-07 22:06:56[39m] (step=0000822) Train Loss mse: 0.0402, Train Loss ce: 0.0640, Train Steps/Sec: 0.09,
|
| 834 |
-
[[34m2026-01-07 22:07:10[39m] (step=0000823) Train Loss mse: 0.0510, Train Loss ce: 0.0636, Train Steps/Sec: 0.08,
|
| 835 |
-
[[34m2026-01-07 22:07:26[39m] (step=0000824) Train Loss mse: 0.0551, Train Loss ce: 0.0690, Train Steps/Sec: 0.06,
|
| 836 |
-
[[34m2026-01-07 22:07:39[39m] (step=0000825) Train Loss mse: 0.0469, Train Loss ce: 0.0644, Train Steps/Sec: 0.07,
|
| 837 |
-
[[34m2026-01-07 22:07:52[39m] (step=0000826) Train Loss mse: 0.0598, Train Loss ce: 0.0676, Train Steps/Sec: 0.08,
|
| 838 |
-
[[34m2026-01-07 22:08:02[39m] (step=0000827) Train Loss mse: 0.0474, Train Loss ce: 0.0626, Train Steps/Sec: 0.09,
|
| 839 |
-
[[34m2026-01-07 22:08:16[39m] (step=0000828) Train Loss mse: 0.0396, Train Loss ce: 0.0633, Train Steps/Sec: 0.07,
|
| 840 |
-
[[34m2026-01-07 22:08:32[39m] (step=0000829) Train Loss mse: 0.0341, Train Loss ce: 0.0630, Train Steps/Sec: 0.06,
|
| 841 |
-
[[34m2026-01-07 22:08:44[39m] (step=0000830) Train Loss mse: 0.0481, Train Loss ce: 0.0613, Train Steps/Sec: 0.08,
|
| 842 |
-
[[34m2026-01-07 22:08:56[39m] (step=0000831) Train Loss mse: 0.0491, Train Loss ce: 0.0609, Train Steps/Sec: 0.08,
|
| 843 |
-
[[34m2026-01-07 22:09:08[39m] (step=0000832) Train Loss mse: 0.0519, Train Loss ce: 0.0643, Train Steps/Sec: 0.08,
|
| 844 |
-
[[34m2026-01-07 22:09:24[39m] (step=0000833) Train Loss mse: 0.0422, Train Loss ce: 0.0661, Train Steps/Sec: 0.06,
|
| 845 |
-
[[34m2026-01-07 22:09:36[39m] (step=0000834) Train Loss mse: 0.0502, Train Loss ce: 0.0766, Train Steps/Sec: 0.09,
|
| 846 |
-
[[34m2026-01-07 22:09:51[39m] (step=0000835) Train Loss mse: 0.0386, Train Loss ce: 0.0603, Train Steps/Sec: 0.06,
|
| 847 |
-
[[34m2026-01-07 22:10:05[39m] (step=0000836) Train Loss mse: 0.0366, Train Loss ce: 0.0586, Train Steps/Sec: 0.07,
|
| 848 |
-
[[34m2026-01-07 22:10:17[39m] (step=0000837) Train Loss mse: 0.0561, Train Loss ce: 0.0694, Train Steps/Sec: 0.08,
|
| 849 |
-
[[34m2026-01-07 22:10:33[39m] (step=0000838) Train Loss mse: 0.0368, Train Loss ce: 0.0644, Train Steps/Sec: 0.06,
|
| 850 |
-
[[34m2026-01-07 22:10:46[39m] (step=0000839) Train Loss mse: 0.0375, Train Loss ce: 0.0635, Train Steps/Sec: 0.07,
|
| 851 |
-
[[34m2026-01-07 22:11:02[39m] (step=0000840) Train Loss mse: 0.0462, Train Loss ce: 0.0678, Train Steps/Sec: 0.06,
|
| 852 |
-
[[34m2026-01-07 22:11:15[39m] (step=0000841) Train Loss mse: 0.0450, Train Loss ce: 0.0676, Train Steps/Sec: 0.08,
|
| 853 |
-
[[34m2026-01-07 22:11:28[39m] (step=0000842) Train Loss mse: 0.0397, Train Loss ce: 0.0628, Train Steps/Sec: 0.08,
|
| 854 |
-
[[34m2026-01-07 22:11:44[39m] (step=0000843) Train Loss mse: 0.0385, Train Loss ce: 0.0599, Train Steps/Sec: 0.06,
|
| 855 |
-
[[34m2026-01-07 22:11:57[39m] (step=0000844) Train Loss mse: 0.0495, Train Loss ce: 0.0653, Train Steps/Sec: 0.07,
|
| 856 |
-
[[34m2026-01-07 22:12:11[39m] (step=0000845) Train Loss mse: 0.0314, Train Loss ce: 0.0680, Train Steps/Sec: 0.07,
|
| 857 |
-
[[34m2026-01-07 22:12:24[39m] (step=0000846) Train Loss mse: 0.0336, Train Loss ce: 0.0606, Train Steps/Sec: 0.08,
|
| 858 |
-
[[34m2026-01-07 22:12:40[39m] (step=0000847) Train Loss mse: 0.0251, Train Loss ce: 0.0687, Train Steps/Sec: 0.06,
|
| 859 |
-
[[34m2026-01-07 22:12:53[39m] (step=0000848) Train Loss mse: 0.0325, Train Loss ce: 0.0635, Train Steps/Sec: 0.08,
|
| 860 |
-
[[34m2026-01-07 22:13:09[39m] (step=0000849) Train Loss mse: 0.0502, Train Loss ce: 0.0647, Train Steps/Sec: 0.06,
|
| 861 |
-
[[34m2026-01-07 22:13:21[39m] (step=0000850) Train Loss mse: 0.0455, Train Loss ce: 0.0658, Train Steps/Sec: 0.08,
|
| 862 |
-
[[34m2026-01-07 22:13:34[39m] (step=0000851) Train Loss mse: 0.0463, Train Loss ce: 0.0628, Train Steps/Sec: 0.07,
|
| 863 |
-
[[34m2026-01-07 22:13:47[39m] (step=0000852) Train Loss mse: 0.0526, Train Loss ce: 0.0703, Train Steps/Sec: 0.08,
|
| 864 |
-
[[34m2026-01-07 22:13:59[39m] (step=0000853) Train Loss mse: 0.0485, Train Loss ce: 0.0628, Train Steps/Sec: 0.08,
|
| 865 |
-
[[34m2026-01-07 22:14:15[39m] (step=0000854) Train Loss mse: 0.0407, Train Loss ce: 0.0654, Train Steps/Sec: 0.06,
|
| 866 |
-
[[34m2026-01-07 22:14:28[39m] (step=0000855) Train Loss mse: 0.0481, Train Loss ce: 0.0616, Train Steps/Sec: 0.08,
|
| 867 |
-
[[34m2026-01-07 22:14:42[39m] (step=0000856) Train Loss mse: 0.0525, Train Loss ce: 0.0702, Train Steps/Sec: 0.07,
|
| 868 |
-
[[34m2026-01-07 22:14:55[39m] (step=0000857) Train Loss mse: 0.0311, Train Loss ce: 0.0600, Train Steps/Sec: 0.07,
|
| 869 |
-
[[34m2026-01-07 22:15:07[39m] (step=0000858) Train Loss mse: 0.0595, Train Loss ce: 0.0597, Train Steps/Sec: 0.08,
|
| 870 |
-
[[34m2026-01-07 22:15:21[39m] (step=0000859) Train Loss mse: 0.0382, Train Loss ce: 0.0590, Train Steps/Sec: 0.07,
|
| 871 |
-
[[34m2026-01-07 22:15:34[39m] (step=0000860) Train Loss mse: 0.0360, Train Loss ce: 0.0655, Train Steps/Sec: 0.08,
|
| 872 |
-
[[34m2026-01-07 22:15:50[39m] (step=0000861) Train Loss mse: 0.0379, Train Loss ce: 0.0579, Train Steps/Sec: 0.06,
|
| 873 |
-
[[34m2026-01-07 22:16:04[39m] (step=0000862) Train Loss mse: 0.0402, Train Loss ce: 0.0612, Train Steps/Sec: 0.07,
|
| 874 |
-
[[34m2026-01-07 22:16:16[39m] (step=0000863) Train Loss mse: 0.0490, Train Loss ce: 0.0752, Train Steps/Sec: 0.08,
|
| 875 |
-
[[34m2026-01-07 22:16:30[39m] (step=0000864) Train Loss mse: 0.0547, Train Loss ce: 0.0663, Train Steps/Sec: 0.07,
|
| 876 |
-
[[34m2026-01-07 22:16:43[39m] (step=0000865) Train Loss mse: 0.0452, Train Loss ce: 0.0632, Train Steps/Sec: 0.07,
|
| 877 |
FullyShardedDataParallel(
|
| 878 |
(_fsdp_wrapped_module): Bagel(
|
| 879 |
(language_model): Qwen2ForCausalLM(
|
|
@@ -1074,13 +1017,63 @@ Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
|
| 1074 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1075 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1076 |
ce_avg: 0.1378343403339386, mse_avg: 0.041579172015190125
|
| 1077 |
-
|
| 1078 |
-
|
| 1079 |
-
[
|
| 1080 |
-
|
| 1081 |
-
|
| 1082 |
-
|
| 1083 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1084 |
[[34m2026-01-07 22:16:57[39m] (step=0000866) Train Loss mse: 0.0280, Train Loss ce: 0.0637, Train Steps/Sec: 0.08,
|
| 1085 |
[[34m2026-01-07 22:17:13[39m] (step=0000867) Train Loss mse: 0.0495, Train Loss ce: 0.0617, Train Steps/Sec: 0.06,
|
| 1086 |
[[34m2026-01-07 22:17:27[39m] (step=0000868) Train Loss mse: 0.0502, Train Loss ce: 0.0616, Train Steps/Sec: 0.07,
|
|
@@ -2221,6 +2214,20 @@ ce_avg: 0.14916321635246277, mse_avg: 0.03999507427215576
|
|
| 2221 |
[[34m2026-01-08 02:40:09[39m] (step=0002003) Train Loss mse: 0.0433, Train Loss ce: 0.0551, Train Steps/Sec: 0.08,
|
| 2222 |
[[34m2026-01-08 02:40:20[39m] (step=0002004) Train Loss mse: 0.0496, Train Loss ce: 0.0568, Train Steps/Sec: 0.09,
|
| 2223 |
[[34m2026-01-08 02:40:34[39m] (step=0002005) Train Loss mse: 0.0419, Train Loss ce: 0.0568, Train Steps/Sec: 0.07,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2224 |
[[34m2026-01-08 02:40:46[39m] (step=0002006) Train Loss mse: 0.0387, Train Loss ce: 0.0622, Train Steps/Sec: 0.08,
|
| 2225 |
[[34m2026-01-08 02:40:57[39m] (step=0002007) Train Loss mse: 0.0473, Train Loss ce: 0.0671, Train Steps/Sec: 0.09,
|
| 2226 |
[[34m2026-01-08 02:41:11[39m] (step=0002008) Train Loss mse: 0.0478, Train Loss ce: 0.0585, Train Steps/Sec: 0.07,
|
|
@@ -2257,20 +2264,6 @@ ce_avg: 0.14916321635246277, mse_avg: 0.03999507427215576
|
|
| 2257 |
[[34m2026-01-08 02:48:21[39m] (step=0002039) Train Loss mse: 0.0500, Train Loss ce: 0.0585, Train Steps/Sec: 0.08,
|
| 2258 |
[[34m2026-01-08 02:48:34[39m] (step=0002040) Train Loss mse: 0.0508, Train Loss ce: 0.0584, Train Steps/Sec: 0.07,
|
| 2259 |
[[34m2026-01-08 02:48:47[39m] (step=0002041) Train Loss mse: 0.0540, Train Loss ce: 0.0626, Train Steps/Sec: 0.08,
|
| 2260 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step2500
|
| 2261 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 2262 |
-
[eval debug] first 3 batch fingerprints:
|
| 2263 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2264 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2265 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2266 |
-
ce_avg: 0.15878577530384064, mse_avg: 0.04078574851155281
|
| 2267 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step3000
|
| 2268 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 2269 |
-
[eval debug] first 3 batch fingerprints:
|
| 2270 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2271 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2272 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2273 |
-
ce_avg: 0.05950051546096802, mse_avg: 0.04290447011590004
|
| 2274 |
[[34m2026-01-08 02:49:01[39m] (step=0002042) Train Loss mse: 0.0372, Train Loss ce: 0.0575, Train Steps/Sec: 0.08,
|
| 2275 |
[[34m2026-01-08 02:49:13[39m] (step=0002043) Train Loss mse: 0.0501, Train Loss ce: 0.0586, Train Steps/Sec: 0.08,
|
| 2276 |
[[34m2026-01-08 02:49:24[39m] (step=0002044) Train Loss mse: 0.0542, Train Loss ce: 0.0584, Train Steps/Sec: 0.09,
|
|
@@ -3129,6 +3122,20 @@ ce_avg: 0.05950051546096802, mse_avg: 0.04290447011590004
|
|
| 3129 |
[[34m2026-01-08 06:07:35[39m] (step=0002894) Train Loss mse: 0.0419, Train Loss ce: 0.0592, Train Steps/Sec: 0.08,
|
| 3130 |
[[34m2026-01-08 06:07:49[39m] (step=0002895) Train Loss mse: 0.0422, Train Loss ce: 0.0577, Train Steps/Sec: 0.07,
|
| 3131 |
[[34m2026-01-08 06:08:02[39m] (step=0002896) Train Loss mse: 0.0327, Train Loss ce: 0.0590, Train Steps/Sec: 0.08,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3132 |
[[34m2026-01-08 06:08:15[39m] (step=0002897) Train Loss mse: 0.0434, Train Loss ce: 0.0559, Train Steps/Sec: 0.07,
|
| 3133 |
[[34m2026-01-08 06:08:29[39m] (step=0002898) Train Loss mse: 0.0478, Train Loss ce: 0.0610, Train Steps/Sec: 0.07,
|
| 3134 |
[[34m2026-01-08 06:08:45[39m] (step=0002899) Train Loss mse: 0.0369, Train Loss ce: 0.0575, Train Steps/Sec: 0.06,
|
|
@@ -3219,20 +3226,6 @@ ce_avg: 0.05950051546096802, mse_avg: 0.04290447011590004
|
|
| 3219 |
[[34m2026-01-08 06:28:08[39m] (step=0002984) Train Loss mse: 0.0387, Train Loss ce: 0.0602, Train Steps/Sec: 0.06,
|
| 3220 |
[[34m2026-01-08 06:28:21[39m] (step=0002985) Train Loss mse: 0.0372, Train Loss ce: 0.0530, Train Steps/Sec: 0.08,
|
| 3221 |
[[34m2026-01-08 06:28:33[39m] (step=0002986) Train Loss mse: 0.0326, Train Loss ce: 0.0561, Train Steps/Sec: 0.08,
|
| 3222 |
-
[[34m2026-01-08 06:28:49
|
| 3223 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step3500
|
| 3224 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 3225 |
-
[eval debug] first 3 batch fingerprints:
|
| 3226 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3227 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3228 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3229 |
-
ce_avg: 0.05882769450545311, mse_avg: 0.039856743067502975
|
| 3230 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step4000
|
| 3231 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 3232 |
-
[eval debug] first 3 batch fingerprints:
|
| 3233 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3234 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3235 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3236 |
[[34m2026-01-08 06:28:49[39m] (step=0002987) Train Loss mse: 0.0406, Train Loss ce: 0.0596, Train Steps/Sec: 0.06,
|
| 3237 |
[[34m2026-01-08 06:29:01[39m] (step=0002988) Train Loss mse: 0.0462, Train Loss ce: 0.0583, Train Steps/Sec: 0.09,
|
| 3238 |
[[34m2026-01-08 06:29:14[39m] (step=0002989) Train Loss mse: 0.0422, Train Loss ce: 0.0606, Train Steps/Sec: 0.07,
|
|
@@ -4378,6 +4371,20 @@ Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
|
| 4378 |
[[34m2026-01-08 10:53:01[39m] (step=0004129) Train Loss mse: 0.0533, Train Loss ce: 0.0561, Train Steps/Sec: 0.09,
|
| 4379 |
[[34m2026-01-08 10:53:13[39m] (step=0004130) Train Loss mse: 0.0327, Train Loss ce: 0.0578, Train Steps/Sec: 0.09,
|
| 4380 |
[[34m2026-01-08 10:53:29[39m] (step=0004131) Train Loss mse: 0.0476, Train Loss ce: 0.0549, Train Steps/Sec: 0.06,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4381 |
[[34m2026-01-08 10:53:41[39m] (step=0004132) Train Loss mse: 0.0427, Train Loss ce: 0.0588, Train Steps/Sec: 0.08,
|
| 4382 |
[[34m2026-01-08 10:53:57[39m] (step=0004133) Train Loss mse: 0.0461, Train Loss ce: 0.0568, Train Steps/Sec: 0.06,
|
| 4383 |
[[34m2026-01-08 10:54:10[39m] (step=0004134) Train Loss mse: 0.0357, Train Loss ce: 0.0562, Train Steps/Sec: 0.08,
|
|
@@ -4530,20 +4537,6 @@ Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
|
| 4530 |
[[34m2026-01-08 11:27:44[39m] (step=0004281) Train Loss mse: 0.0305, Train Loss ce: 0.0599, Train Steps/Sec: 0.08,
|
| 4531 |
[[34m2026-01-08 11:27:57[39m] (step=0004282) Train Loss mse: 0.0300, Train Loss ce: 0.0572, Train Steps/Sec: 0.07,
|
| 4532 |
[[34m2026-01-08 11:28:12[39m] (step=0004283) Train Loss mse: 0.0388, Train Loss ce: 0.0600, Train Steps/Sec: 0.07,
|
| 4533 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step4500
|
| 4534 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 4535 |
-
[eval debug] first 3 batch fingerprints:
|
| 4536 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4537 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4538 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4539 |
-
ce_avg: 0.05841600522398949, mse_avg: 0.040690697729587555
|
| 4540 |
-
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step5000
|
| 4541 |
-
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 4542 |
-
[eval debug] first 3 batch fingerprints:
|
| 4543 |
-
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4544 |
-
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4545 |
-
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4546 |
-
ce_avg: 0.058565959334373474, mse_avg: 0.04069836065173149
|
| 4547 |
[[34m2026-01-08 11:28:26[39m] (step=0004284) Train Loss mse: 0.0509, Train Loss ce: 0.0572, Train Steps/Sec: 0.07,
|
| 4548 |
[[34m2026-01-08 11:28:39[39m] (step=0004285) Train Loss mse: 0.0355, Train Loss ce: 0.0584, Train Steps/Sec: 0.07,
|
| 4549 |
[[34m2026-01-08 11:28:52[39m] (step=0004286) Train Loss mse: 0.0272, Train Loss ce: 0.0561, Train Steps/Sec: 0.08,
|
|
@@ -5262,4 +5255,11 @@ ce_avg: 0.058565959334373474, mse_avg: 0.04069836065173149
|
|
| 5262 |
[[34m2026-01-08 14:13:25[39m] (step=0004999) Train Loss mse: 0.0486, Train Loss ce: 0.0565, Train Steps/Sec: 0.08,
|
| 5263 |
[[34m2026-01-08 14:14:43[39m] (step=0005000) Train Loss mse: 0.0408, Train Loss ce: 0.0569, Train Steps/Sec: 0.01,
|
| 5264 |
[[34m2026-01-08 14:14:43[39m] Saving checkpoint to /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/0005000.
|
| 5265 |
-
[[34m2026-01-08 14:17:28[39m] Done!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 817 |
[[34m2026-01-07 22:03:25[39m] (step=0000806) Train Loss mse: 0.0474, Train Loss ce: 0.0616, Train Steps/Sec: 0.08,
|
| 818 |
[[34m2026-01-07 22:03:37[39m] (step=0000807) Train Loss mse: 0.0525, Train Loss ce: 0.0598, Train Steps/Sec: 0.08,
|
| 819 |
[[34m2026-01-07 22:03:49[39m] (step=0000808) Train Loss mse: 0.0454, Train Loss ce: 0.0603, Train Steps/Sec: 0.09,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 820 |
FullyShardedDataParallel(
|
| 821 |
(_fsdp_wrapped_module): Bagel(
|
| 822 |
(language_model): Qwen2ForCausalLM(
|
|
|
|
| 1017 |
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1018 |
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 1019 |
ce_avg: 0.1378343403339386, mse_avg: 0.041579172015190125
|
| 1020 |
+
[[34m2026-01-07 22:04:05[39m] (step=0000809) Train Loss mse: 0.0326, Train Loss ce: 0.0647, Train Steps/Sec: 0.06,
|
| 1021 |
+
[[34m2026-01-07 22:04:17[39m] (step=0000810) Train Loss mse: 0.0437, Train Loss ce: 0.0584, Train Steps/Sec: 0.08,
|
| 1022 |
+
[[34m2026-01-07 22:04:29[39m] (step=0000811) Train Loss mse: 0.0597, Train Loss ce: 0.0633, Train Steps/Sec: 0.08,
|
| 1023 |
+
[[34m2026-01-07 22:04:41[39m] (step=0000812) Train Loss mse: 0.0387, Train Loss ce: 0.0599, Train Steps/Sec: 0.09,
|
| 1024 |
+
[[34m2026-01-07 22:04:55[39m] (step=0000813) Train Loss mse: 0.0491, Train Loss ce: 0.0613, Train Steps/Sec: 0.07,
|
| 1025 |
+
[[34m2026-01-07 22:05:08[39m] (step=0000814) Train Loss mse: 0.0546, Train Loss ce: 0.0671, Train Steps/Sec: 0.08,
|
| 1026 |
+
[[34m2026-01-07 22:05:24[39m] (step=0000815) Train Loss mse: 0.0472, Train Loss ce: 0.0726, Train Steps/Sec: 0.06,
|
| 1027 |
+
[[34m2026-01-07 22:05:41[39m] (step=0000816) Train Loss mse: 0.0341, Train Loss ce: 0.0607, Train Steps/Sec: 0.06,
|
| 1028 |
+
[[34m2026-01-07 22:05:56[39m] (step=0000817) Train Loss mse: 0.0452, Train Loss ce: 0.0611, Train Steps/Sec: 0.06,
|
| 1029 |
+
[[34m2026-01-07 22:06:10[39m] (step=0000818) Train Loss mse: 0.0349, Train Loss ce: 0.0640, Train Steps/Sec: 0.08,
|
| 1030 |
+
[[34m2026-01-07 22:06:21[39m] (step=0000819) Train Loss mse: 0.0482, Train Loss ce: 0.0636, Train Steps/Sec: 0.09,
|
| 1031 |
+
[[34m2026-01-07 22:06:32[39m] (step=0000820) Train Loss mse: 0.0424, Train Loss ce: 0.0601, Train Steps/Sec: 0.09,
|
| 1032 |
+
[[34m2026-01-07 22:06:45[39m] (step=0000821) Train Loss mse: 0.0306, Train Loss ce: 0.0608, Train Steps/Sec: 0.08,
|
| 1033 |
+
[[34m2026-01-07 22:06:56[39m] (step=0000822) Train Loss mse: 0.0402, Train Loss ce: 0.0640, Train Steps/Sec: 0.09,
|
| 1034 |
+
[[34m2026-01-07 22:07:10[39m] (step=0000823) Train Loss mse: 0.0510, Train Loss ce: 0.0636, Train Steps/Sec: 0.08,
|
| 1035 |
+
[[34m2026-01-07 22:07:26[39m] (step=0000824) Train Loss mse: 0.0551, Train Loss ce: 0.0690, Train Steps/Sec: 0.06,
|
| 1036 |
+
[[34m2026-01-07 22:07:39[39m] (step=0000825) Train Loss mse: 0.0469, Train Loss ce: 0.0644, Train Steps/Sec: 0.07,
|
| 1037 |
+
[[34m2026-01-07 22:07:52[39m] (step=0000826) Train Loss mse: 0.0598, Train Loss ce: 0.0676, Train Steps/Sec: 0.08,
|
| 1038 |
+
[[34m2026-01-07 22:08:02[39m] (step=0000827) Train Loss mse: 0.0474, Train Loss ce: 0.0626, Train Steps/Sec: 0.09,
|
| 1039 |
+
[[34m2026-01-07 22:08:16[39m] (step=0000828) Train Loss mse: 0.0396, Train Loss ce: 0.0633, Train Steps/Sec: 0.07,
|
| 1040 |
+
[[34m2026-01-07 22:08:32[39m] (step=0000829) Train Loss mse: 0.0341, Train Loss ce: 0.0630, Train Steps/Sec: 0.06,
|
| 1041 |
+
[[34m2026-01-07 22:08:44[39m] (step=0000830) Train Loss mse: 0.0481, Train Loss ce: 0.0613, Train Steps/Sec: 0.08,
|
| 1042 |
+
[[34m2026-01-07 22:08:56[39m] (step=0000831) Train Loss mse: 0.0491, Train Loss ce: 0.0609, Train Steps/Sec: 0.08,
|
| 1043 |
+
[[34m2026-01-07 22:09:08[39m] (step=0000832) Train Loss mse: 0.0519, Train Loss ce: 0.0643, Train Steps/Sec: 0.08,
|
| 1044 |
+
[[34m2026-01-07 22:09:24[39m] (step=0000833) Train Loss mse: 0.0422, Train Loss ce: 0.0661, Train Steps/Sec: 0.06,
|
| 1045 |
+
[[34m2026-01-07 22:09:36[39m] (step=0000834) Train Loss mse: 0.0502, Train Loss ce: 0.0766, Train Steps/Sec: 0.09,
|
| 1046 |
+
[[34m2026-01-07 22:09:51[39m] (step=0000835) Train Loss mse: 0.0386, Train Loss ce: 0.0603, Train Steps/Sec: 0.06,
|
| 1047 |
+
[[34m2026-01-07 22:10:05[39m] (step=0000836) Train Loss mse: 0.0366, Train Loss ce: 0.0586, Train Steps/Sec: 0.07,
|
| 1048 |
+
[[34m2026-01-07 22:10:17[39m] (step=0000837) Train Loss mse: 0.0561, Train Loss ce: 0.0694, Train Steps/Sec: 0.08,
|
| 1049 |
+
[[34m2026-01-07 22:10:33[39m] (step=0000838) Train Loss mse: 0.0368, Train Loss ce: 0.0644, Train Steps/Sec: 0.06,
|
| 1050 |
+
[[34m2026-01-07 22:10:46[39m] (step=0000839) Train Loss mse: 0.0375, Train Loss ce: 0.0635, Train Steps/Sec: 0.07,
|
| 1051 |
+
[[34m2026-01-07 22:11:02[39m] (step=0000840) Train Loss mse: 0.0462, Train Loss ce: 0.0678, Train Steps/Sec: 0.06,
|
| 1052 |
+
[[34m2026-01-07 22:11:15[39m] (step=0000841) Train Loss mse: 0.0450, Train Loss ce: 0.0676, Train Steps/Sec: 0.08,
|
| 1053 |
+
[[34m2026-01-07 22:11:28[39m] (step=0000842) Train Loss mse: 0.0397, Train Loss ce: 0.0628, Train Steps/Sec: 0.08,
|
| 1054 |
+
[[34m2026-01-07 22:11:44[39m] (step=0000843) Train Loss mse: 0.0385, Train Loss ce: 0.0599, Train Steps/Sec: 0.06,
|
| 1055 |
+
[[34m2026-01-07 22:11:57[39m] (step=0000844) Train Loss mse: 0.0495, Train Loss ce: 0.0653, Train Steps/Sec: 0.07,
|
| 1056 |
+
[[34m2026-01-07 22:12:11[39m] (step=0000845) Train Loss mse: 0.0314, Train Loss ce: 0.0680, Train Steps/Sec: 0.07,
|
| 1057 |
+
[[34m2026-01-07 22:12:24[39m] (step=0000846) Train Loss mse: 0.0336, Train Loss ce: 0.0606, Train Steps/Sec: 0.08,
|
| 1058 |
+
[[34m2026-01-07 22:12:40[39m] (step=0000847) Train Loss mse: 0.0251, Train Loss ce: 0.0687, Train Steps/Sec: 0.06,
|
| 1059 |
+
[[34m2026-01-07 22:12:53[39m] (step=0000848) Train Loss mse: 0.0325, Train Loss ce: 0.0635, Train Steps/Sec: 0.08,
|
| 1060 |
+
[[34m2026-01-07 22:13:09[39m] (step=0000849) Train Loss mse: 0.0502, Train Loss ce: 0.0647, Train Steps/Sec: 0.06,
|
| 1061 |
+
[[34m2026-01-07 22:13:21[39m] (step=0000850) Train Loss mse: 0.0455, Train Loss ce: 0.0658, Train Steps/Sec: 0.08,
|
| 1062 |
+
[[34m2026-01-07 22:13:34[39m] (step=0000851) Train Loss mse: 0.0463, Train Loss ce: 0.0628, Train Steps/Sec: 0.07,
|
| 1063 |
+
[[34m2026-01-07 22:13:47[39m] (step=0000852) Train Loss mse: 0.0526, Train Loss ce: 0.0703, Train Steps/Sec: 0.08,
|
| 1064 |
+
[[34m2026-01-07 22:13:59[39m] (step=0000853) Train Loss mse: 0.0485, Train Loss ce: 0.0628, Train Steps/Sec: 0.08,
|
| 1065 |
+
[[34m2026-01-07 22:14:15[39m] (step=0000854) Train Loss mse: 0.0407, Train Loss ce: 0.0654, Train Steps/Sec: 0.06,
|
| 1066 |
+
[[34m2026-01-07 22:14:28[39m] (step=0000855) Train Loss mse: 0.0481, Train Loss ce: 0.0616, Train Steps/Sec: 0.08,
|
| 1067 |
+
[[34m2026-01-07 22:14:42[39m] (step=0000856) Train Loss mse: 0.0525, Train Loss ce: 0.0702, Train Steps/Sec: 0.07,
|
| 1068 |
+
[[34m2026-01-07 22:14:55[39m] (step=0000857) Train Loss mse: 0.0311, Train Loss ce: 0.0600, Train Steps/Sec: 0.07,
|
| 1069 |
+
[[34m2026-01-07 22:15:07[39m] (step=0000858) Train Loss mse: 0.0595, Train Loss ce: 0.0597, Train Steps/Sec: 0.08,
|
| 1070 |
+
[[34m2026-01-07 22:15:21[39m] (step=0000859) Train Loss mse: 0.0382, Train Loss ce: 0.0590, Train Steps/Sec: 0.07,
|
| 1071 |
+
[[34m2026-01-07 22:15:34[39m] (step=0000860) Train Loss mse: 0.0360, Train Loss ce: 0.0655, Train Steps/Sec: 0.08,
|
| 1072 |
+
[[34m2026-01-07 22:15:50[39m] (step=0000861) Train Loss mse: 0.0379, Train Loss ce: 0.0579, Train Steps/Sec: 0.06,
|
| 1073 |
+
[[34m2026-01-07 22:16:04[39m] (step=0000862) Train Loss mse: 0.0402, Train Loss ce: 0.0612, Train Steps/Sec: 0.07,
|
| 1074 |
+
[[34m2026-01-07 22:16:16[39m] (step=0000863) Train Loss mse: 0.0490, Train Loss ce: 0.0752, Train Steps/Sec: 0.08,
|
| 1075 |
+
[[34m2026-01-07 22:16:30[39m] (step=0000864) Train Loss mse: 0.0547, Train Loss ce: 0.0663, Train Steps/Sec: 0.07,
|
| 1076 |
+
[[34m2026-01-07 22:16:43[39m] (step=0000865) Train Loss mse: 0.0452, Train Loss ce: 0.0632, Train Steps/Sec: 0.07,
|
| 1077 |
[[34m2026-01-07 22:16:57[39m] (step=0000866) Train Loss mse: 0.0280, Train Loss ce: 0.0637, Train Steps/Sec: 0.08,
|
| 1078 |
[[34m2026-01-07 22:17:13[39m] (step=0000867) Train Loss mse: 0.0495, Train Loss ce: 0.0617, Train Steps/Sec: 0.06,
|
| 1079 |
[[34m2026-01-07 22:17:27[39m] (step=0000868) Train Loss mse: 0.0502, Train Loss ce: 0.0616, Train Steps/Sec: 0.07,
|
|
|
|
| 2214 |
[[34m2026-01-08 02:40:09[39m] (step=0002003) Train Loss mse: 0.0433, Train Loss ce: 0.0551, Train Steps/Sec: 0.08,
|
| 2215 |
[[34m2026-01-08 02:40:20[39m] (step=0002004) Train Loss mse: 0.0496, Train Loss ce: 0.0568, Train Steps/Sec: 0.09,
|
| 2216 |
[[34m2026-01-08 02:40:34[39m] (step=0002005) Train Loss mse: 0.0419, Train Loss ce: 0.0568, Train Steps/Sec: 0.07,
|
| 2217 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step2000
|
| 2218 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 2219 |
+
[eval debug] first 3 batch fingerprints:
|
| 2220 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2221 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2222 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2223 |
+
ce_avg: 0.14916321635246277, mse_avg: 0.03999507427215576
|
| 2224 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step2500
|
| 2225 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 2226 |
+
[eval debug] first 3 batch fingerprints:
|
| 2227 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2228 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2229 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 2230 |
+
ce_avg: 0.15878577530384064, mse_avg: 0.04078574851155281
|
| 2231 |
[[34m2026-01-08 02:40:46[39m] (step=0002006) Train Loss mse: 0.0387, Train Loss ce: 0.0622, Train Steps/Sec: 0.08,
|
| 2232 |
[[34m2026-01-08 02:40:57[39m] (step=0002007) Train Loss mse: 0.0473, Train Loss ce: 0.0671, Train Steps/Sec: 0.09,
|
| 2233 |
[[34m2026-01-08 02:41:11[39m] (step=0002008) Train Loss mse: 0.0478, Train Loss ce: 0.0585, Train Steps/Sec: 0.07,
|
|
|
|
| 2264 |
[[34m2026-01-08 02:48:21[39m] (step=0002039) Train Loss mse: 0.0500, Train Loss ce: 0.0585, Train Steps/Sec: 0.08,
|
| 2265 |
[[34m2026-01-08 02:48:34[39m] (step=0002040) Train Loss mse: 0.0508, Train Loss ce: 0.0584, Train Steps/Sec: 0.07,
|
| 2266 |
[[34m2026-01-08 02:48:47[39m] (step=0002041) Train Loss mse: 0.0540, Train Loss ce: 0.0626, Train Steps/Sec: 0.08,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2267 |
[[34m2026-01-08 02:49:01[39m] (step=0002042) Train Loss mse: 0.0372, Train Loss ce: 0.0575, Train Steps/Sec: 0.08,
|
| 2268 |
[[34m2026-01-08 02:49:13[39m] (step=0002043) Train Loss mse: 0.0501, Train Loss ce: 0.0586, Train Steps/Sec: 0.08,
|
| 2269 |
[[34m2026-01-08 02:49:24[39m] (step=0002044) Train Loss mse: 0.0542, Train Loss ce: 0.0584, Train Steps/Sec: 0.09,
|
|
|
|
| 3122 |
[[34m2026-01-08 06:07:35[39m] (step=0002894) Train Loss mse: 0.0419, Train Loss ce: 0.0592, Train Steps/Sec: 0.08,
|
| 3123 |
[[34m2026-01-08 06:07:49[39m] (step=0002895) Train Loss mse: 0.0422, Train Loss ce: 0.0577, Train Steps/Sec: 0.07,
|
| 3124 |
[[34m2026-01-08 06:08:02[39m] (step=0002896) Train Loss mse: 0.0327, Train Loss ce: 0.0590, Train Steps/Sec: 0.08,
|
| 3125 |
+
[[34m2026-01-08 06:08:15
|
| 3126 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step3000
|
| 3127 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 3128 |
+
[eval debug] first 3 batch fingerprints:
|
| 3129 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3130 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3131 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3132 |
+
ce_avg: 0.05950051546096802, mse_avg: 0.04290447011590004
|
| 3133 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step3500
|
| 3134 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 3135 |
+
[eval debug] first 3 batch fingerprints:
|
| 3136 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3137 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3138 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 3139 |
[[34m2026-01-08 06:08:15[39m] (step=0002897) Train Loss mse: 0.0434, Train Loss ce: 0.0559, Train Steps/Sec: 0.07,
|
| 3140 |
[[34m2026-01-08 06:08:29[39m] (step=0002898) Train Loss mse: 0.0478, Train Loss ce: 0.0610, Train Steps/Sec: 0.07,
|
| 3141 |
[[34m2026-01-08 06:08:45[39m] (step=0002899) Train Loss mse: 0.0369, Train Loss ce: 0.0575, Train Steps/Sec: 0.06,
|
|
|
|
| 3226 |
[[34m2026-01-08 06:28:08[39m] (step=0002984) Train Loss mse: 0.0387, Train Loss ce: 0.0602, Train Steps/Sec: 0.06,
|
| 3227 |
[[34m2026-01-08 06:28:21[39m] (step=0002985) Train Loss mse: 0.0372, Train Loss ce: 0.0530, Train Steps/Sec: 0.08,
|
| 3228 |
[[34m2026-01-08 06:28:33[39m] (step=0002986) Train Loss mse: 0.0326, Train Loss ce: 0.0561, Train Steps/Sec: 0.08,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3229 |
[[34m2026-01-08 06:28:49[39m] (step=0002987) Train Loss mse: 0.0406, Train Loss ce: 0.0596, Train Steps/Sec: 0.06,
|
| 3230 |
[[34m2026-01-08 06:29:01[39m] (step=0002988) Train Loss mse: 0.0462, Train Loss ce: 0.0583, Train Steps/Sec: 0.09,
|
| 3231 |
[[34m2026-01-08 06:29:14[39m] (step=0002989) Train Loss mse: 0.0422, Train Loss ce: 0.0606, Train Steps/Sec: 0.07,
|
|
|
|
| 4371 |
[[34m2026-01-08 10:53:01[39m] (step=0004129) Train Loss mse: 0.0533, Train Loss ce: 0.0561, Train Steps/Sec: 0.09,
|
| 4372 |
[[34m2026-01-08 10:53:13[39m] (step=0004130) Train Loss mse: 0.0327, Train Loss ce: 0.0578, Train Steps/Sec: 0.09,
|
| 4373 |
[[34m2026-01-08 10:53:29[39m] (step=0004131) Train Loss mse: 0.0476, Train Loss ce: 0.0549, Train Steps/Sec: 0.06,
|
| 4374 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step4000
|
| 4375 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 4376 |
+
[eval debug] first 3 batch fingerprints:
|
| 4377 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4378 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4379 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4380 |
+
ce_avg: 0.05835578218102455, mse_avg: 0.04091694951057434
|
| 4381 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step4500
|
| 4382 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 4383 |
+
[eval debug] first 3 batch fingerprints:
|
| 4384 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4385 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4386 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 4387 |
+
ce_avg: 0.05841600522398949, mse_avg: 0.040690697729587555
|
| 4388 |
[[34m2026-01-08 10:53:41[39m] (step=0004132) Train Loss mse: 0.0427, Train Loss ce: 0.0588, Train Steps/Sec: 0.08,
|
| 4389 |
[[34m2026-01-08 10:53:57[39m] (step=0004133) Train Loss mse: 0.0461, Train Loss ce: 0.0568, Train Steps/Sec: 0.06,
|
| 4390 |
[[34m2026-01-08 10:54:10[39m] (step=0004134) Train Loss mse: 0.0357, Train Loss ce: 0.0562, Train Steps/Sec: 0.08,
|
|
|
|
| 4537 |
[[34m2026-01-08 11:27:44[39m] (step=0004281) Train Loss mse: 0.0305, Train Loss ce: 0.0599, Train Steps/Sec: 0.08,
|
| 4538 |
[[34m2026-01-08 11:27:57[39m] (step=0004282) Train Loss mse: 0.0300, Train Loss ce: 0.0572, Train Steps/Sec: 0.07,
|
| 4539 |
[[34m2026-01-08 11:28:12[39m] (step=0004283) Train Loss mse: 0.0388, Train Loss ce: 0.0600, Train Steps/Sec: 0.07,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4540 |
[[34m2026-01-08 11:28:26[39m] (step=0004284) Train Loss mse: 0.0509, Train Loss ce: 0.0572, Train Steps/Sec: 0.07,
|
| 4541 |
[[34m2026-01-08 11:28:39[39m] (step=0004285) Train Loss mse: 0.0355, Train Loss ce: 0.0584, Train Steps/Sec: 0.07,
|
| 4542 |
[[34m2026-01-08 11:28:52[39m] (step=0004286) Train Loss mse: 0.0272, Train Loss ce: 0.0561, Train Steps/Sec: 0.08,
|
|
|
|
| 5255 |
[[34m2026-01-08 14:13:25[39m] (step=0004999) Train Loss mse: 0.0486, Train Loss ce: 0.0565, Train Steps/Sec: 0.08,
|
| 5256 |
[[34m2026-01-08 14:14:43[39m] (step=0005000) Train Loss mse: 0.0408, Train Loss ce: 0.0569, Train Steps/Sec: 0.01,
|
| 5257 |
[[34m2026-01-08 14:14:43[39m] Saving checkpoint to /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/0005000.
|
| 5258 |
+
[[34m2026-01-08 14:17:28[39m] Done!
|
| 5259 |
+
base_dir is /dev/shm/models/checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce/eval_used_rows, step_tag is checkpoints_vlm_gym_jigsaw_one_image_lr2e_5_ce_ema993_hashed_step5000
|
| 5260 |
+
Preparing Dataset vlm_gym_jigsaw_celoss_evalonce/vlm_gym_jigsaw_val
|
| 5261 |
+
[eval debug] first 3 batch fingerprints:
|
| 5262 |
+
fp[0]: [{'data_indexes': [0], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 5263 |
+
fp[1]: [{'data_indexes': [8], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 5264 |
+
fp[2]: [{'data_indexes': [16], 'worker_id': 0, 'dataset_name': 'vlm_gym_jigsaw_celoss_evalonce'}]
|
| 5265 |
+
ce_avg: 0.058565959334373474, mse_avg: 0.04069836065173149
|