diff --git "a/training_artifacts/logs/pipeline_cleaned.txt" "b/training_artifacts/logs/pipeline_cleaned.txt" --- "a/training_artifacts/logs/pipeline_cleaned.txt" +++ "b/training_artifacts/logs/pipeline_cleaned.txt" @@ -9497,6 +9497,10 @@ Training config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__ Starting distributed training with torch.distributed.run... +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** + ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** @@ -9508,19 +9512,19 @@ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default import pkg_resources /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources -[INFO|2025-10-22 20:35:15] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16 -[INFO|2025-10-22 20:35:15] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled. -[INFO|2025-10-22 20:35:15] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16 -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,237 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,237 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,237 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,237 >> loading file added_tokens.json from cache at None -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,238 >> loading file special_tokens_map.json from cache at None -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,238 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,238 >> loading file chat_template.jinja from cache at None -[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:35:15,408 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. -[INFO|configuration_utils.py:765] 2025-10-22 20:35:15,689 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json -[INFO|configuration_utils.py:839] 2025-10-22 20:35:15,691 >> Model config Qwen2Config { +[INFO|2025-10-22 20:35:16] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled. +[INFO|2025-10-22 20:35:16] llamafactory.hparams.parser:423 >> Process rank: 3, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16 +[INFO|2025-10-22 20:35:16] llamafactory.hparams.parser:423 >> Process rank: 2, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16 +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file added_tokens.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file special_tokens_map.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:16,767 >> loading file chat_template.jinja from cache at None +[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:35:16,937 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[INFO|configuration_utils.py:765] 2025-10-22 20:35:17,315 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:35:17,317 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], @@ -9576,90 +9580,87 @@ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default "vocab_size": 151936 } -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file added_tokens.json from cache at None -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file special_tokens_map.json from cache at None -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json -[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:15,755 >> loading file chat_template.jinja from cache at None -[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:35:15,922 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. -[INFO|2025-10-22 20:35:15] llamafactory.data.loader:143 >> Loading dataset TAUR-dev/D-SFT_C-BASELINE_r1_distillation-sft-data... -Converting format of dataset (num_proc=16): 0%| | 0/3998 [00:00 -gl064:2626306:2626306 [0] NCCL INFO cudaDriverVersion 13000 -gl064:2626306:2626306 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 -gl064:2626306:2626306 [0] NCCL INFO Comm config Blocking set to 1 -gl064:2626307:2626307 [1] NCCL INFO cudaDriverVersion 13000 -gl064:2626307:2626307 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs -gl064:2626307:2626307 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0> -gl064:2626307:2626307 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 -gl064:2626307:2626307 [1] NCCL INFO Comm config Blocking set to 1 -gl064:2626306:2626406 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. -gl064:2626306:2626406 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. -gl064:2626306:2626406 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs -gl064:2626306:2626406 [0] NCCL INFO NCCL_IB_HCA set to mlx5 -gl064:2626307:2626407 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. -gl064:2626307:2626407 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. -gl064:2626307:2626407 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs -gl064:2626307:2626407 [1] NCCL INFO NCCL_IB_HCA set to mlx5 -gl064:2626306:2626406 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0> -gl064:2626306:2626406 [0] NCCL INFO Initialized NET plugin IB -gl064:2626306:2626406 [0] NCCL INFO Assigned NET plugin IB to comm -gl064:2626306:2626406 [0] NCCL INFO Using network IB -gl064:2626307:2626407 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0> -gl064:2626307:2626407 [1] NCCL INFO Initialized NET plugin IB -gl064:2626306:2626406 [0] NCCL INFO ncclCommInitRankConfig comm 0x13d4e850 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0x5fc912795e7fee89 - Init START -gl064:2626307:2626407 [1] NCCL INFO Assigned NET plugin IB to comm -gl064:2626307:2626407 [1] NCCL INFO Using network IB -gl064:2626307:2626407 [1] NCCL INFO ncclCommInitRankConfig comm 0x15ba7840 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0x5fc912795e7fee89 - Init START -gl064:2626306:2626406 [0] NCCL INFO RAS client listening socket at ::1<28028> -gl064:2626307:2626407 [1] NCCL INFO RAS client listening socket at ::1<28028> -gl064:2626306:2626406 [0] NCCL INFO Bootstrap timings total 1.008501 (create 0.000023, send 0.000230, recv 0.000728, ring 0.438069, delay 0.000000) -gl064:2626307:2626407 [1] NCCL INFO Bootstrap timings total 1.007716 (create 0.000021, send 0.000076, recv 1.006053, ring 0.000830, delay 0.000000) -gl064:2626306:2626406 [0] NCCL INFO Setting affinity for GPU 0 to 0-15 -gl064:2626307:2626407 [1] NCCL INFO Setting affinity for GPU 1 to 0-15 -gl064:2626307:2626407 [1] NCCL INFO comm 0x15ba7840 rank 1 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0 -gl064:2626306:2626406 [0] NCCL INFO comm 0x13d4e850 rank 0 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0 -gl064:2626306:2626406 [0] NCCL INFO Channel 00/02 : 0 1 2 3 -gl064:2626307:2626407 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 -gl064:2626306:2626406 [0] NCCL INFO Channel 01/02 : 0 1 2 3 -gl064:2626307:2626407 [1] NCCL INFO P2P Chunksize set to 131072 -gl064:2626306:2626406 [0] NCCL INFO Trees [0] 1/2/-1->0->-1 [1] 1/-1/-1->0->2 -gl064:2626306:2626406 [0] NCCL INFO P2P Chunksize set to 131072 -gl064:2626306:2626406 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. -gl064:2626307:2626407 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. -gl064:2626306:2626406 [0] NCCL INFO Check P2P Type isAllDirectP2p 0 directMode 0 -gl064:2626306:2626412 [0] NCCL INFO [Proxy Service] Device 0 CPU core 6 -gl064:2626307:2626413 [1] NCCL INFO [Proxy Service] Device 1 CPU core 2 -gl064:2626307:2626414 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 10 -gl064:2626306:2626415 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 10 -gl064:2626307:2626407 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512 -gl064:2626307:2626407 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer -gl064:2626306:2626406 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512 -gl064:2626306:2626406 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer -gl064:2626306:2626406 [0] NCCL INFO CC Off, workFifoBytes 1048576 -gl064:2626307:2626407 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. -gl064:2626307:2626407 [1] NCCL INFO ncclCommInitRankConfig comm 0x15ba7840 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0x5fc912795e7fee89 - Init COMPLETE -gl064:2626307:2626407 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 4 total 1.12 (kernels 0.08, alloc 0.01, bootstrap 1.01, allgathers 0.00, topo 0.01, graphs 0.00, connections 0.00, rest 0.00) -gl064:2626306:2626406 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. -gl064:2626306:2626406 [0] NCCL INFO ncclCommInitRankConfig comm 0x13d4e850 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0x5fc912795e7fee89 - Init COMPLETE -gl064:2626306:2626406 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 4 total 1.12 (kernels 0.08, alloc 0.01, bootstrap 1.01, allgathers 0.00, topo 0.01, graphs 0.00, connections 0.00, rest 0.00) -gl064:2626306:2626417 [0] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [receive] via NET/IB/0 -gl064:2626306:2626417 [0] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [receive] via NET/IB/0 -gl064:2626306:2626418 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 14 -gl064:2626306:2626417 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct -gl064:2626306:2626417 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct -gl064:2626307:2626419 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 3 -gl064:2626307:2626416 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [send] via NET/IB/0 -gl064:2626307:2626416 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [send] via NET/IB/0 -gl064:2626307:2626416 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0 -gl064:2626306:2626417 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0 -Running tokenizer on dataset (num_proc=16): 0%| | 0/3998 [00:00> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:17,384 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:17,384 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:17,384 >> loading file added_tokens.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:17,384 >> loading file special_tokens_map.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:17,384 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:35:17,384 >> loading file chat_template.jinja from cache at None +gl065:3838856:3838856 [1] NCCL INFO cudaDriverVersion 13000 +gl065:3838856:3838856 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl065:3838856:3838856 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.1<0> +gl065:3838856:3838856 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 +gl065:3838856:3838856 [1] NCCL INFO Comm config Blocking set to 1 +gl065:3838856:3838939 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. +gl065:3838856:3838939 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. +gl065:3838856:3838939 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl065:3838856:3838939 [1] NCCL INFO NCCL_IB_HCA set to mlx5 +gl065:3838856:3838939 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.1<0> +gl065:3838856:3838939 [1] NCCL INFO Initialized NET plugin IB +gl065:3838856:3838939 [1] NCCL INFO Assigned NET plugin IB to comm +gl065:3838856:3838939 [1] NCCL INFO Using network IB +gl065:3838856:3838939 [1] NCCL INFO ncclCommInitRankConfig comm 0x15605790 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0x5fc912795e7fee89 - Init START +[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:35:17,551 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[rank2]:[W1022 20:35:17.488976424 ProcessGroupNCCL.cpp:5068] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group() +gl065:3838855:3838855 [0] NCCL INFO cudaDriverVersion 13000 +gl065:3838855:3838855 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl065:3838855:3838855 [0] NCCL INFO Bootstrap: Using ibs3:10.0.5.1<0> +gl065:3838855:3838855 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 +gl065:3838855:3838855 [0] NCCL INFO Comm config Blocking set to 1 +gl065:3838855:3838944 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. +gl065:3838855:3838944 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. +gl065:3838855:3838944 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl065:3838855:3838944 [0] NCCL INFO NCCL_IB_HCA set to mlx5 +gl065:3838855:3838944 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.1<0> +gl065:3838855:3838944 [0] NCCL INFO Initialized NET plugin IB +gl065:3838855:3838944 [0] NCCL INFO Assigned NET plugin IB to comm +gl065:3838855:3838944 [0] NCCL INFO Using network IB +gl065:3838855:3838944 [0] NCCL INFO ncclCommInitRankConfig comm 0x14c25ae0 rank 2 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0x5fc912795e7fee89 - Init START +gl065:3838855:3838944 [0] NCCL INFO RAS client listening socket at ::1<28028> +gl065:3838856:3838939 [1] NCCL INFO RAS client listening socket at ::1<28028> +gl065:3838855:3838944 [0] NCCL INFO Bootstrap timings total 0.002882 (create 0.000028, send 0.000421, recv 0.000996, ring 0.000836, delay 0.000000) +gl065:3838856:3838939 [1] NCCL INFO Bootstrap timings total 0.440912 (create 0.000028, send 0.000570, recv 0.000877, ring 0.000798, delay 0.000000) +gl065:3838855:3838944 [0] NCCL INFO Setting affinity for GPU 0 to 0-15 +gl065:3838856:3838939 [1] NCCL INFO Setting affinity for GPU 1 to 0-15 +gl065:3838855:3838944 [0] NCCL INFO comm 0x14c25ae0 rank 2 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0 +gl065:3838856:3838939 [1] NCCL INFO comm 0x15605790 rank 3 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0 +gl065:3838855:3838944 [0] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 3/0/-1->2->-1 +gl065:3838856:3838939 [1] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 +gl065:3838855:3838944 [0] NCCL INFO P2P Chunksize set to 131072 +gl065:3838856:3838939 [1] NCCL INFO P2P Chunksize set to 131072 +gl065:3838855:3838944 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. +gl065:3838856:3838939 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. +gl065:3838855:3838948 [0] NCCL INFO [Proxy Service] Device 0 CPU core 10 +gl065:3838856:3838949 [1] NCCL INFO [Proxy Service] Device 1 CPU core 5 +gl065:3838855:3838950 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 11 +gl065:3838856:3838951 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 12 +gl065:3838855:3838944 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512 +gl065:3838855:3838944 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer +gl065:3838856:3838939 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512 +gl065:3838856:3838939 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer +gl065:3838855:3838944 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. +gl065:3838855:3838944 [0] NCCL INFO ncclCommInitRankConfig comm 0x14c25ae0 rank 2 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0x5fc912795e7fee89 - Init COMPLETE +gl065:3838855:3838944 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 2 nranks 4 total 0.11 (kernels 0.08, alloc 0.01, bootstrap 0.00, allgathers 0.01, topo 0.01, graphs 0.00, connections 0.00, rest 0.00) +gl065:3838856:3838939 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. +gl065:3838856:3838939 [1] NCCL INFO ncclCommInitRankConfig comm 0x15605790 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0x5fc912795e7fee89 - Init COMPLETE +gl065:3838856:3838939 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 3 nranks 4 total 0.55 (kernels 0.08, alloc 0.01, bootstrap 0.44, allgathers 0.00, topo 0.01, graphs 0.00, connections 0.00, rest 0.00) +gl065:3838855:3838953 [0] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [receive] via NET/IB/0 +gl065:3838855:3838954 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 15 +gl065:3838855:3838953 [0] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [receive] via NET/IB/0 +gl065:3838855:3838953 [0] NCCL INFO Channel 00 : 2[0] -> 3[1] via SHM/direct/direct +gl065:3838855:3838953 [0] NCCL INFO Channel 01 : 2[0] -> 3[1] via SHM/direct/direct +gl065:3838856:3838952 [1] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [send] via NET/IB/0 +gl065:3838856:3838952 [1] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [send] via NET/IB/0 +gl065:3838856:3838955 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 6 +gl065:3838856:3838952 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0 +gl065:3838855:3838953 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0 +[INFO|2025-10-22 20:35:17] llamafactory.data.loader:143 >> Loading dataset TAUR-dev/D-SFT_C-BASELINE_r1_distillation-sft-data... +Converting format of dataset (num_proc=16): 100%|| 3998/3998 [00:00<|endoftext|> -Saving the dataset (0/1 shards): 0%| | 0/3598 [00:00> Tokenized dataset is saved at /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12. -[INFO|2025-10-22 20:35:22] llamafactory.data.loader:143 >> Please launch the training with `tokenized_path: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12`. -[INFO|configuration_utils.py:765] 2025-10-22 20:35:22,705 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json -[INFO|configuration_utils.py:839] 2025-10-22 20:35:22,705 >> Model config Qwen2Config { +[INFO|configuration_utils.py:765] 2025-10-22 20:35:25,602 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:35:25,603 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], @@ -10324,50 +10321,43 @@ Saving the dataset (0/1 shards): 0%| | 0/400 [00:00> KV cache is disabled during training. -[WARNING|logging.py:328] 2025-10-22 20:35:23,064 >> `torch_dtype` is deprecated! Use `dtype` instead! -[INFO|modeling_utils.py:1172] 2025-10-22 20:35:23,065 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors -[INFO|modeling_utils.py:2341] 2025-10-22 20:35:23,066 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. -[INFO|configuration_utils.py:986] 2025-10-22 20:35:23,067 >> Generate config GenerationConfig { +[INFO|2025-10-22 20:35:25] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training. +[WARNING|logging.py:328] 2025-10-22 20:35:26,177 >> `torch_dtype` is deprecated! Use `dtype` instead! +[INFO|modeling_utils.py:1172] 2025-10-22 20:35:26,178 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors +[INFO|modeling_utils.py:2341] 2025-10-22 20:35:26,179 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. +[INFO|configuration_utils.py:986] 2025-10-22 20:35:26,180 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643, "use_cache": false } -`torch_dtype` is deprecated! Use `dtype` instead! -[INFO|configuration_utils.py:941] 2025-10-22 20:35:23,333 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json -[INFO|configuration_utils.py:986] 2025-10-22 20:35:23,333 >> Generate config GenerationConfig { +[INFO|configuration_utils.py:941] 2025-10-22 20:35:26,428 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json +[INFO|configuration_utils.py:986] 2025-10-22 20:35:26,428 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643, "max_new_tokens": 2048 } -[INFO|dynamic_module_utils.py:423] 2025-10-22 20:35:23,367 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B. -[INFO|2025-10-22 20:35:23] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled. -[INFO|2025-10-22 20:35:23] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. -[INFO|2025-10-22 20:35:23] llamafactory.model.adapter:143 >> Upcasting trainable params to float32. -[INFO|2025-10-22 20:35:23] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA -[INFO|2025-10-22 20:35:23] llamafactory.model.model_utils.misc:143 >> Found linear modules: gate_proj,up_proj,v_proj,q_proj,o_proj,down_proj,k_proj -[INFO|2025-10-22 20:35:23] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826 -[WARNING|trainer.py:906] 2025-10-22 20:35:23,620 >> The model is already on multiple devices. Skipping the move to device specified in `args`. -[INFO|trainer.py:699] 2025-10-22 20:35:23,623 >> max_steps is given, it will override any value given in num_train_epochs -[INFO|trainer.py:749] 2025-10-22 20:35:23,623 >> Using auto half precision backend -[WARNING|2025-10-22 20:35:23] llamafactory.train.callbacks:154 >> Previous trainer log in this folder will be deleted. -[WARNING|trainer.py:982] 2025-10-22 20:35:23,627 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. -The model is already on multiple devices. Skipping the move to device specified in `args`. -The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. -[INFO|trainer.py:2519] 2025-10-22 20:35:26,876 >> ***** Running training ***** -[INFO|trainer.py:2520] 2025-10-22 20:35:26,876 >> Num examples = 3,598 -[INFO|trainer.py:2521] 2025-10-22 20:35:26,876 >> Num Epochs = 1 -[INFO|trainer.py:2522] 2025-10-22 20:35:26,876 >> Instantaneous batch size per device = 1 -[INFO|trainer.py:2525] 2025-10-22 20:35:26,876 >> Total train batch size (w. parallel, distributed & accumulation) = 4 -[INFO|trainer.py:2526] 2025-10-22 20:35:26,876 >> Gradient Accumulation steps = 1 -[INFO|trainer.py:2527] 2025-10-22 20:35:26,876 >> Total optimization steps = 100 -[INFO|trainer.py:2528] 2025-10-22 20:35:26,878 >> Number of trainable parameters = 4,399,104 -[INFO|integration_utils.py:867] 2025-10-22 20:35:26,899 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" -wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin -wandb: Tracking run with wandb version 0.22.2 -wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_203527-54101z6o +[INFO|dynamic_module_utils.py:423] 2025-10-22 20:35:26,453 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B. +[INFO|2025-10-22 20:35:26] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled. +[INFO|2025-10-22 20:35:26] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. +[INFO|2025-10-22 20:35:26] llamafactory.model.adapter:143 >> Upcasting trainable params to float32. +[INFO|2025-10-22 20:35:26] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA +[INFO|2025-10-22 20:35:26] llamafactory.model.model_utils.misc:143 >> Found linear modules: q_proj,k_proj,v_proj,gate_proj,down_proj,o_proj,up_proj +[INFO|2025-10-22 20:35:26] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826 +[WARNING|trainer.py:906] 2025-10-22 20:35:26,716 >> The model is already on multiple devices. Skipping the move to device specified in `args`. +[INFO|trainer.py:699] 2025-10-22 20:35:26,718 >> max_steps is given, it will override any value given in num_train_epochs +[INFO|trainer.py:749] 2025-10-22 20:35:26,718 >> Using auto half precision backend +[WARNING|trainer.py:982] 2025-10-22 20:35:26,719 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. +[INFO|trainer.py:2519] 2025-10-22 20:35:26,880 >> ***** Running training ***** +[INFO|trainer.py:2520] 2025-10-22 20:35:26,880 >> Num examples = 3,598 +[INFO|trainer.py:2521] 2025-10-22 20:35:26,880 >> Num Epochs = 1 +[INFO|trainer.py:2522] 2025-10-22 20:35:26,880 >> Instantaneous batch size per device = 1 +[INFO|trainer.py:2525] 2025-10-22 20:35:26,880 >> Total train batch size (w. parallel, distributed & accumulation) = 4 +[INFO|trainer.py:2526] 2025-10-22 20:35:26,880 >> Gradient Accumulation steps = 1 +[INFO|trainer.py:2527] 2025-10-22 20:35:26,880 >> Total optimization steps = 100 +[INFO|trainer.py:2528] 2025-10-22 20:35:26,881 >> Number of trainable parameters = 4,399,104 +y in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_203527-54101z6o wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run interactive_test wandb: View project at https://wandb.ai/ut_nlp_deduce/llamafactory @@ -10442,8 +10432,14 @@ wandb: View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/54101z6o 60%| | 60/100 [00:21<00:13, 3.01it/s] 61%| | 61/100 [00:22<00:12, 3.06it/s] 62%| | 62/100 [00:22<00:12, 3.14it/s] 63%| | 63/100 [00:22<00:11, 3.29it/s] 64%| | 64/100 [00:22<00:10, 3.45it/s] 65%| | 65/100 [00:23<00:10, 3.34it/s] 66%| | 66/100 [00:23<00:10, 3.14it/s] 67%| | 67/100 [00:23<00:09, 3.33it/s] 68%| | 68/100 [00:24<00:09, 3.37it/s] 69%| | 69/100 [00:24<00:09, 3.22it/s] 70%| | 70/100 [00:24<00:09, 3.06it/s] {'loss': 0.9991, 'grad_norm': 0.4352080523967743, 'learning_rate': 1.55e-05, 'epoch': 0.08} 70%| | 70/100 [00:24<00:09, 3.06it/s] 71%| | 71/100 [00:25<00:09, 3.20it/s] 72%| | 72/100 [00:25<00:08, 3.46it/s] 73%| | 73/100 [00:25<00:07, 3.72it/s] 74%| | 74/100 [00:26<00:09, 2.77it/s] 75%| | 75/100 [00:26<00:08, 3.00it/s] 76%| | 76/100 [00:26<00:07, 3.32it/s] 77%| | 77/100 [00:26<00:07, 3.20it/s] 78%| | 78/100 [00:27<00:06, 3.53it/s] 79%| | 79/100 [00:27<00:05, 3.54it/s] 80%| | 80/100 [00:27<00:05, 3.72it/s] {'loss': 0.9537, 'grad_norm': 0.4677208364009857, 'learning_rate': 1.05e-05, 'epoch': 0.09} 80%| | 80/100 [00:27<00:05, 3.72it/s] 81%| | 81/100 [00:27<00:05, 3.74it/s] 82%| | 82/100 [00:28<00:04, 3.71it/s] 83%| | 83/100 [00:28<00:05, 3.27it/s] 84%| | 84/100 [00:28<00:04, 3.29it/s] 85%| | 85/100 [00:29<00:04, 3.35it/s] 86%| | 86/100 [00:29<00:04, 3.41it/s] 87%| | 87/100 [00:29<00:03, 3.43it/s] 88%| | 88/100 [00:30<00:03, 3.54it/s] 89%| | 89/100 [00:30<00:03, 3.63it/s] 90%| | 90/100 [00:30<00:02, 3.47it/s] {'loss': 0.9677, 'grad_norm': 0.46978959441185, 'learning_rate': 5.500000000000001e-06, 'epoch': 0.1} - 90%| | 90/100 [00:30<00:02, 3.47it/s] 91%| | 91/100 [00:30<00:02, 3.36it/s] 92%|| 92/100 [00:31<00:03, 2.64it/s] 93%|| 93/100 [00:31<00:02, 2.71it/s] 94%|| 94/100 [00:32<00:02, 2.74it/s] 95%|| 95/100 [00:32<00:01, 2.93it/s] 96%|| 96/100 [00:32<00:01, 3.09it/s] 97%|| 97/100 [00:33<00:00, 3.24it/s] 98%|| 98/100 [00:33<00:00, 3.13it/s] 99%|| 99/100 [00:33<00:00, 3.00it/s]100%|| 100/100 [00:34<00:00, 2.92it/s] {'loss': 0.9472, 'grad_norm': 0.4593953490257263, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.11} -100%|| 100/100 [00:34<00:00, 2.92it/s][INFO|trainer.py:4309] 2025-10-22 20:36:02,040 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 + 90%| | 90/100 [00:30<00:02, 3.47it/s] 91%| | 91/100 [00:30<00:02, 3.36it/s] 92%|| 92/100 [00:31<00:03, 2.64it/s] 93%|| 93/100 [00:31<00:02, 2.71it/s] 94%|| 94/100 [00:32<00:02, 2.74it/s] 95%|| 95/100 [00:32<00:01, 2.93it/s] 96%|| 96/100 [00:32<00:01, 3.09it/s] 97%|| 97/100 [00:33<00:00, 3.24it/s] 98%|| 98/100 [00:33<00:00, 3.13it/s][INFO|trainer.py:2810] 2025-10-22 20:36:02,031 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +gl065:3838856:3838856 [1] NCCL INFO comm 0x15605790 rank 3 nranks 4 cudaDev 1 busId 59000 - Destroy COMPLETE +gl065:3838855:3838855 [0] NCCL INFO comm 0x14c25ae0 rank 2 nranks 4 cudaDev 0 busId 47000 - Destroy COMPLETE + 2.92it/s][INFO|trainer.py:4309] 2025-10-22 20:36:02,040 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 [INFO|configuration_utils.py:765] 2025-10-22 20:36:02,193 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json [INFO|configuration_utils.py:839] 2025-10-22 20:36:02,194 >> Model config Qwen2Config { "architectures": [ @@ -10795,3 +10791,984 @@ Preparing Training Artifacts ======================================== Copying configuration files... Copying and cleaning training logs... +Training artifacts prepared in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/training_artifacts +Contents: +Log files: + +======================================== +STAGE 3: Uploading to HuggingFace Hub +Repository: TAUR-dev/testing_llamafactory_helper_quick_test__interactive +Start Time: Wed Oct 22 08:36:20 PM EDT 2025 +======================================== +Uploading contents of: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged +Directory structure: + +Executing: huggingface-cli upload TAUR-dev/testing_llamafactory_helper_quick_test__interactive /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged . +Start hashing 17 files. +Finished hashing 17 files. +[33m Warning: 'huggingface-cli upload' is deprecated. Use 'hf upload' instead.[0m +Processing Files (0 / 0) : | | 0.00B / 0.00B +New Data Upload : | | 0.00B / 0.00B [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 10%| | 101MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 10%| | 101MB / 988MB [A[A[AProcessing Files (1 / 2) : 11%| | 112MB / 1.00GB, ???B/s + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 22%| | 218MB / 988MB [A[A[AProcessing Files (1 / 2) : 23%| | 229MB / 1.00GB, 588MB/s + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 28%| | 272MB / 988MB [A[A[AProcessing Files (1 / 2) : 28%| | 284MB / 1.00GB, 429MB/s + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 28%| | 273MB / 988MB [A[A[AProcessing Files (1 / 2) : 28%| | 284MB / 1.00GB, 287MB/s +New Data Upload : 1%| | 606kB / 67.1MB, 1.01MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 34%| | 332MB / 988MB [A[A[AProcessing Files (1 / 2) : 34%| | 344MB / 1.00GB, 290MB/s +New Data Upload : 45%| | 60.0MB / 134MB, 75.0MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 40%| | 399MB / 988MB [A[A[AProcessing Files (1 / 2) : 41%| | 410MB / 1.00GB, 299MB/s +New Data Upload : 63%| | 127MB / 201MB, 127MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 44%| | 433MB / 988MB [A[A[AProcessing Files (1 / 2) : 44%| | 444MB / 1.00GB, 277MB/s +New Data Upload : 80%| | 160MB / 201MB, 134MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 48%| | 474MB / 988MB [A[A[AProcessing Files (1 / 2) : 49%| | 486MB / 1.00GB, 267MB/s +New Data Upload : 75%| | 202MB / 268MB, 144MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 55%| | 539MB / 988MB [A[A[AProcessing Files (1 / 2) : 55%| | 550MB / 1.00GB, 274MB/s +New Data Upload : 80%| | 267MB / 335MB, 167MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 60%| | 590MB / 988MB [A[A[AProcessing Files (1 / 2) : 60%| | 601MB / 1.00GB, 272MB/s +New Data Upload : 79%| | 318MB / 402MB, 177MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 66%| | 655MB / 988MB [A[A[AProcessing Files (1 / 2) : 67%| | 666MB / 1.00GB, 277MB/s +New Data Upload : 82%| | 383MB / 470MB, 191MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 73%| | 725MB / 988MB [A[A[AProcessing Files (1 / 2) : 74%| | 736MB / 1.00GB, 284MB/s +New Data Upload : 84%| | 452MB / 537MB, 206MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 78%| | 773MB / 988MB [A[A[AProcessing Files (1 / 2) : 78%| | 784MB / 1.00GB, 280MB/s +New Data Upload : 93%|| 500MB / 537MB, 209MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 84%| | 833MB / 988MB [A[A[AProcessing Files (1 / 2) : 84%| | 844MB / 1.00GB, 282MB/s +New Data Upload : 93%|| 560MB / 604MB, 216MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 89%| | 882MB / 988MB [A[A[AProcessing Files (1 / 2) : 89%| | 893MB / 1.00GB, 279MB/s +New Data Upload : 91%| | 610MB / 671MB, 218MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 95%|| 943MB / 988MB [A[A[AProcessing Files (1 / 2) : 95%|| 954MB / 1.00GB, 281MB/s +New Data Upload : 94%|| 671MB / 716MB, 224MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 987MB / 988MB [A[A[AProcessing Files (1 / 2) : 100%|| 998MB / 1.00GB, 277MB/s +New Data Upload : 100%|| 715MB / 716MB, 223MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 987MB / 988MB [A[A[AProcessing Files (1 / 2) : 100%|| 998MB / 1.00GB, 261MB/s +New Data Upload : 100%|| 715MB / 716MB, 210MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[AProcessing Files (1 / 2) : 100%|| 999MB / 1.00GB, 246MB/s +New Data Upload : 100%|| 715MB / 716MB, 199MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[AProcessing Files (1 / 2) : 100%|| 999MB / 1.00GB, 234MB/s +New Data Upload : 100%|| 716MB / 716MB, 188MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[AProcessing Files (2 / 2) : 100%|| 1.00GB / 1.00GB, 178MB/s +New Data Upload : 100%|| 716MB / 716MB, 143MB/s [A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[A + + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB [A[A + + + .../merged/model.safetensors: 100%|| 988MB / 988MB [A[A[AProcessing Files (2 / 2) : 100%|| 1.00GB / 1.00GB, 171MB/s +New Data Upload : 100%|| 716MB / 716MB, 138MB/s + ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB + .../merged/model.safetensors: 100%|| 988MB / 988MB +Removing 11 file(s) from commit that have not changed. +https://huggingface.co/TAUR-dev/testing_llamafactory_helper_quick_test__interactive/tree/main/. + +======================================== +Upload completed successfully +Model and training artifacts uploaded to: TAUR-dev/testing_llamafactory_helper_quick_test__interactive +End Time: Wed Oct 22 08:36:30 PM EDT 2025 +======================================== + +======================================== +STAGE 4: Cleanup +======================================== +Keeping checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints +Keeping merged model in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged + +======================================== +PIPELINE COMPLETED SUCCESSFULLY +End Time: Wed Oct 22 08:36:30 PM EDT 2025 +======================================== + +======================================== +Cleaning up LlamaFactory processes +======================================== +Cleaned up processes on gl064.hpc.nyu.edu +Cleaning up processes on worker node: gl065 +Process cleanup complete +======================================== +Job Name: lf_torch_test__interactive +Hostname: gl064.hpc.nyu.edu +Number of nodes: 2 +GPUs per node: 2 +Start Time: Wed Oct 22 08:36:52 PM EDT 2025 +Log file: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/logs/pipeline.log +======================================== +Sourcing secrets from: /scratch/zrs2020/LlamaFactoryHelper/secrets.env + +======================================== +Configuration Paths +======================================== +Train Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml +Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml +Dataset Info: +Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints +Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged +HF Repo ID: TAUR-dev/testing_llamafactory_helper_quick_test__interactive + + +======================================== +Multi-Node Coordination +======================================== +This is the master node - coordinating worker nodes... +Master node: gl064 +Master port: 29500 +World size: 2 + +Launching on worker node 1: gl065 +All worker nodes launched successfully +Master node (this node) will now join training as rank 0 + + +Found pre-tokenized dataset at: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12 +Training will load from cached tokenized data (fast startup) + +======================================== +STAGE 1: Training Model +Start Time: Wed Oct 22 08:36:55 PM EDT 2025 +======================================== +Multi-node training detected +Nodes: 2, GPUs per node: 2 +Master address: gl064 +Master port: 29500 +Node rank: 0 +World size: 2 +CUDA_VISIBLE_DEVICES: 0,1 +LLaMA-Factory path: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory +Training config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml + +Starting distributed training with torch.distributed.run... + +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. + warnings.warn( +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. + warnings.warn( +=========== +STAGE 1: Training Model +Start Time: Wed Oct 22 08:37:00 PM EDT 2025 +======================================== +Multi-node training detected +Nodes: 2, GPUs per node: 2 +Master address: gl064 +Master port: 29500 +Node rank: 1 +World size: 2 +CUDA_VISIBLE_DEVICES: 0,1 +LLaMA-Factory path: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory +Training config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml + +Starting distributed training with torch.distributed.run... + +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. + warnings.warn( +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. + warnings.warn( +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. + import pkg_resources +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. + import pkg_resources +[INFO|2025-10-22 20:37:10] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16 +[INFO|2025-10-22 20:37:10] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled. +[INFO|2025-10-22 20:37:10] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16 +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file added_tokens.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file special_tokens_map.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:10,905 >> loading file chat_template.jinja from cache at None +[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:37:11,084 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[INFO|configuration_utils.py:765] 2025-10-22 20:37:11,261 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:37:11,263 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file added_tokens.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file special_tokens_map.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:11,322 >> loading file chat_template.jinja from cache at None +[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:37:11,497 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[WARNING|2025-10-22 20:37:11] llamafactory.data.loader:148 >> Loading dataset from disk will ignore other data arguments. +[INFO|2025-10-22 20:37:11] llamafactory.data.loader:143 >> Loaded tokenized dataset from /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12. +[INFO|configuration_utils.py:765] 2025-10-22 20:37:11,598 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:37:11,599 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[INFO|2025-10-22 20:37:11] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training. +`torch_dtype` is deprecated! Use `dtype` instead! +[WARNING|logging.py:328] 2025-10-22 20:37:12,027 >> `torch_dtype` is deprecated! Use `dtype` instead! +[INFO|modeling_utils.py:1172] 2025-10-22 20:37:12,028 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors +[INFO|modeling_utils.py:2341] 2025-10-22 20:37:12,029 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. +[INFO|configuration_utils.py:986] 2025-10-22 20:37:12,030 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151643, + "use_cache": false +} + +[INFO|configuration_utils.py:941] 2025-10-22 20:37:12,334 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json +[INFO|configuration_utils.py:986] 2025-10-22 20:37:12,335 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151643, + "max_new_tokens": 2048 +} + +[INFO|dynamic_module_utils.py:423] 2025-10-22 20:37:12,366 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B. +[INFO|2025-10-22 20:37:12] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled. +[INFO|2025-10-22 20:37:12] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. +[INFO|2025-10-22 20:37:12] llamafactory.model.adapter:143 >> Upcasting trainable params to float32. +[INFO|2025-10-22 20:37:12] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA +[INFO|2025-10-22 20:37:12] llamafactory.model.model_utils.misc:143 >> Found linear modules: gate_proj,q_proj,down_proj,o_proj,v_proj,k_proj,up_proj +The model is already on multiple devices. Skipping the move to device specified in `args`. +The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. +[INFO|2025-10-22 20:37:12] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826 +[WARNING|trainer.py:906] 2025-10-22 20:37:12,644 >> The model is already on multiple devices. Skipping the move to device specified in `args`. +[INFO|trainer.py:699] 2025-10-22 20:37:12,647 >> max_steps is given, it will override any value given in num_train_epochs +[INFO|trainer.py:749] 2025-10-22 20:37:12,647 >> Using auto half precision backend +[WARNING|2025-10-22 20:37:12] llamafactory.train.callbacks:154 >> Previous trainer log in this folder will be deleted. +[WARNING|trainer.py:982] 2025-10-22 20:37:12,650 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. +gl064:2627272:2627272 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl064:2627272:2627272 [0] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0> +gl064:2627272:2627272 [0] NCCL INFO cudaDriverVersion 13000 +gl064:2627272:2627272 [0] NCCL INFO NCCL version 2.27.5+cuda12.9 +gl064:2627272:2627272 [0] NCCL INFO Comm config Blocking set to 1 +gl064:2627273:2627273 [1] NCCL INFO cudaDriverVersion 13000 +gl064:2627273:2627273 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl064:2627273:2627273 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0> +gl064:2627273:2627273 [1] NCCL INFO NCCL version 2.27.5+cuda12.9 +gl064:2627273:2627273 [1] NCCL INFO Comm config Blocking set to 1 +gl064:2627273:2627302 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. +gl064:2627272:2627301 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. +gl064:2627273:2627302 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. +gl064:2627272:2627301 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. +gl064:2627273:2627302 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl064:2627272:2627301 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs +gl064:2627273:2627302 [1] NCCL INFO NCCL_IB_HCA set to mlx5 +gl064:2627272:2627301 [0] NCCL INFO NCCL_IB_HCA set to mlx5 +gl064:2627273:2627302 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0> +gl064:2627272:2627301 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0> +gl064:2627273:2627302 [1] NCCL INFO Initialized NET plugin IB +gl064:2627272:2627301 [0] NCCL INFO Initialized NET plugin IB +gl064:2627273:2627302 [1] NCCL INFO Assigned NET plugin IB to comm +gl064:2627272:2627301 [0] NCCL INFO Assigned NET plugin IB to comm +gl064:2627273:2627302 [1] NCCL INFO Using network IB +gl064:2627272:2627301 [0] NCCL INFO Using network IB +gl064:2627272:2627301 [0] NCCL INFO ncclCommInitRankConfig comm 0x15210000 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init START +gl064:2627273:2627302 [1] NCCL INFO ncclCommInitRankConfig comm 0x138c8d70 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init START +gl064:2627273:2627302 [1] NCCL INFO RAS client listening socket at ::1<28028> +gl064:2627272:2627301 [0] NCCL INFO RAS client listening socket at ::1<28028> +gl064:2627272:2627301 [0] NCCL INFO Bootstrap timings total 4.455940 (create 0.000026, send 0.000090, recv 0.000397, ring 0.000373, delay 0.000000) +gl064:2627273:2627302 [1] NCCL INFO Bootstrap timings total 4.456660 (create 0.000023, send 0.000087, recv 4.452725, ring 0.003056, delay 0.000000) +gl064:2627272:2627301 [0] NCCL INFO Setting affinity for GPU 0 to 0-15 +gl064:2627273:2627302 [1] NCCL INFO Setting affinity for GPU 1 to 0-15 +gl064:2627272:2627301 [0] NCCL INFO comm 0x15210000 rank 0 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0 +gl064:2627273:2627302 [1] NCCL INFO comm 0x138c8d70 rank 1 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0 +gl064:2627272:2627301 [0] NCCL INFO Channel 00/02 : 0 1 2 3 +gl064:2627272:2627301 [0] NCCL INFO Channel 01/02 : 0 1 2 3 +gl064:2627273:2627302 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 +gl064:2627272:2627301 [0] NCCL INFO Trees [0] 1/2/-1->0->-1 [1] 1/-1/-1->0->2 +gl064:2627273:2627302 [1] NCCL INFO P2P Chunksize set to 131072 +gl064:2627272:2627301 [0] NCCL INFO P2P Chunksize set to 131072 +gl064:2627273:2627302 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. +gl064:2627273:2627308 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 8 +gl064:2627273:2627307 [1] NCCL INFO [Proxy Service] Device 1 CPU core 5 +gl064:2627272:2627301 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. +gl064:2627272:2627301 [0] NCCL INFO Check P2P Type isAllDirectP2p 0 directMode 0 +gl064:2627273:2627302 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512 +gl064:2627273:2627302 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer +gl064:2627272:2627310 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 15 +gl064:2627272:2627309 [0] NCCL INFO [Proxy Service] Device 0 CPU core 14 +gl064:2627272:2627301 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512 +gl064:2627272:2627301 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer +gl064:2627272:2627301 [0] NCCL INFO CC Off, workFifoBytes 1048576 +gl064:2627273:2627302 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. +gl064:2627273:2627302 [1] NCCL INFO ncclCommInitRankConfig comm 0x138c8d70 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init COMPLETE +gl064:2627272:2627301 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin. +gl064:2627273:2627302 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 4 total 4.59 (kernels 0.08, alloc 0.02, bootstrap 4.46, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00) +gl064:2627272:2627301 [0] NCCL INFO ncclCommInitRankConfig comm 0x15210000 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init COMPLETE +gl064:2627272:2627301 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 4 total 4.60 (kernels 0.09, alloc 0.02, bootstrap 4.46, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00) +gl064:2627272:2627312 [0] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [receive] via NET/IB/0 +gl064:2627272:2627312 [0] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [receive] via NET/IB/0 +gl064:2627272:2627313 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 11 +gl064:2627273:2627311 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [send] via NET/IB/0 +gl064:2627273:2627311 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [send] via NET/IB/0 +gl064:2627273:2627314 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 6 +gl064:2627272:2627312 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct +gl064:2627272:2627312 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct +gl064:2627272:2627312 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0 +gl064:2627273:2627311 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0 +[INFO|trainer.py:2519] 2025-10-22 20:37:17,497 >> ***** Running training ***** +[INFO|trainer.py:2520] 2025-10-22 20:37:17,497 >> Num examples = 3,598 +[INFO|trainer.py:2521] 2025-10-22 20:37:17,497 >> Num Epochs = 1 +[INFO|trainer.py:2522] 2025-10-22 20:37:17,498 >> Instantaneous batch size per device = 1 +[INFO|trainer.py:2525] 2025-10-22 20:37:17,498 >> Total train batch size (w. parallel, distributed & accumulation) = 4 +[INFO|trainer.py:2526] 2025-10-22 20:37:17,498 >> Gradient Accumulation steps = 1 +[INFO|trainer.py:2527] 2025-10-22 20:37:17,498 >> Total optimization steps = 100 +[INFO|trainer.py:2528] 2025-10-22 20:37:17,499 >> Number of trainable parameters = 4,399,104 +[INFO|integration_utils.py:867] 2025-10-22 20:37:17,501 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin +wandb: Tracking run with wandb version 0.22.2 +wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_203717-18s2z8v7 +wandb: Run `wandb offline` to turn off syncing. +wandb: Syncing run interactive_test +wandb: View project at https://wandb.ai/ut_nlp_deduce/llamafactory +wandb: View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/18s2z8v7 + 0%| | 0/100 [00:00> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50 +[INFO|configuration_utils.py:765] 2025-10-22 20:37:35,643 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:37:35,644 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:37:35,783 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/chat_template.jinja +[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:37:35,789 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/tokenizer_config.json +[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:37:35,809 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/special_tokens_map.json + 51%| | 51/100 [00:18<00:27, 1.81it/s] 52%| | 52/100 [00:18<00:23, 2.07it/s] 53%| | 53/100 [00:18<00:22, 2.13it/s] 54%| | 54/100 [00:19<00:20, 2.26it/s] 55%| | 55/100 [00:19<00:18, 2.41it/s] 56%| | 56/100 [00:19<00:16, 2.62it/s] 57%| | 57/100 [00:20<00:15, 2.79it/s] 58%| | 58/100 [00:20<00:14, 2.83it/s] 59%| | 59/100 [00:20<00:13, 2.94it/s] 60%| | 60/100 [00:21<00:13, 3.04it/s] {'loss': 0.9981, 'grad_norm': 0.4507186710834503, 'learning_rate': 2.05e-05, 'epoch': 0.07} + 60%| | 60/100 [00:21<00:13, 3.04it/s] 61%| | 61/100 [00:21<00:12, 3.09it/s] 62%| | 62/100 [00:21<00:12, 3.16it/s] 63%| | 63/100 [00:21<00:11, 3.30it/s] 64%| | 64/100 [00:22<00:10, 3.45it/s] 65%| | 65/100 [00:22<00:10, 3.39it/s] 66%| | 66/100 [00:22<00:10, 3.17it/s] 67%| | 67/100 [00:23<00:09, 3.35it/s] 68%| | 68/100 [00:23<00:09, 3.37it/s] 69%| | 69/100 [00:23<00:09, 3.22it/s] 70%| | 70/100 [00:24<00:09, 3.20it/s] {'loss': 0.9991, 'grad_norm': 0.4351355731487274, 'learning_rate': 1.55e-05, 'epoch': 0.08} + 70%| | 70/100 [00:24<00:09, 3.20it/s] 71%| | 71/100 [00:24<00:08, 3.35it/s] 72%| | 72/100 [00:24<00:07, 3.56it/s] 73%| | 73/100 [00:24<00:07, 3.80it/s] 74%| | 74/100 [00:25<00:09, 2.79it/s] 75%| | 75/100 [00:25<00:08, 3.01it/s] 76%| | 76/100 [00:25<00:07, 3.33it/s] 77%| | 77/100 [00:26<00:07, 3.20it/s] 78%| | 78/100 [00:26<00:06, 3.53it/s] 79%| | 79/100 [00:26<00:05, 3.52it/s] 80%| | 80/100 [00:26<00:05, 3.77it/s] {'loss': 0.9537, 'grad_norm': 0.4680567979812622, 'learning_rate': 1.05e-05, 'epoch': 0.09} + 80%| | 80/100 [00:26<00:05, 3.77it/s] 81%| | 81/100 [00:27<00:05, 3.77it/s] 82%| | 82/100 [00:27<00:04, 3.73it/s] 83%| | 83/100 [00:27<00:05, 3.32it/s] 84%| | 84/100 [00:28<00:04, 3.37it/s] 85%| | 85/100 [00:28<00:04, 3.41it/s] 86%| | 86/100 [00:28<00:04, 3.44it/s] 87%| | 87/100 [00:29<00:03, 3.47it/s] 88%| | 88/100 [00:29<00:03, 3.57it/s] 89%| | 89/100 [00:29<00:03, 3.65it/s] 90%| | 90/100 [00:29<00:02, 3.47it/s] {'loss': 0.9677, 'grad_norm': 0.46988463401794434, 'learning_rate': 5.500000000000001e-06, 'epoch': 0.1} + 90%| | 90/100 [00:29<00:02, 3.47it/s] 91%| | 91/100 [00:30<00:02, 3.41it/s] 92%|| 92/100 [00:30<00:02, 2.70it/s] 93%|| 93/100 [00:31<00:02, 2.78it/s] 94%|| 94/100 [00:31<00:02, 2.83it/s] 95%|| 95/100 [00:31<00:01, 2.99it/s] 96%|| 96/100 [00:31<00:01, 3.13it/s] 97%|| 97/100 [00:32<00:00, 3.27it/s] 98%|| 98/100 [00:32<00:00, 3.13it/s] 99%|| 99/100 [00:32<00:00, 3.00it/s]100%|| 100/100 [00:33<00:00, 2.92it/s] {'loss': 0.9472, 'grad_norm': 0.45911866426467896, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.11} +100%|| 100/100 [00:33<00:00, 2.92it/s][INFO|trainer.py:4309] 2025-10-22 20:37:51,912 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 +[INFO|configuration_utils.py:765] 2025-10-22 20:37:52,016 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:37:52,017 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:37:52,204 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/chat_template.jinja +[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:37:52,209 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/tokenizer_config.json +[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:37:52,213 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/special_tokens_map.json +[INFO|trainer.py:2810] 2025-10-22 20:37:52,728 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + + {'train_runtime': 35.2292, 'train_samples_per_second': 11.354, 'train_steps_per_second': 2.839, 'train_loss': 1.056056594848633, 'epoch': 0.11} +100%|| 100/100 [00:34<00:00, 2.92it/s]100%|| 100/100 [00:34<00:00, 2.93it/s] +[INFO|trainer.py:4309] 2025-10-22 20:37:52,737 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints +[INFO|configuration_utils.py:765] 2025-10-22 20:37:52,830 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:37:52,831 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:37:52,945 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/chat_template.jinja +[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:37:52,950 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/tokenizer_config.json +[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:37:52,954 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/special_tokens_map.json +***** train metrics ***** + epoch = 0.1111 + total_flos = 2407106GF + train_loss = 1.0561 + train_runtime = 0:00:35.22 + train_samples_per_second = 11.354 + train_steps_per_second = 2.839 +[INFO|modelcard.py:456] 2025-10-22 20:37:53,125 >> Dropping the following result as it does not have all the necessary fields: +{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} +gl064:2627273:2627273 [1] NCCL INFO comm 0x138c8d70 rank 1 nranks 4 cudaDev 1 busId 59000 - Destroy COMPLETE +gl064:2627272:2627272 [0] NCCL INFO comm 0x15210000 rank 0 nranks 4 cudaDev 0 busId 47000 - Destroy COMPLETE +[1;34mwandb[0m: +[1;34mwandb[0m: View run [33minteractive_test[0m at: [34m[0m +[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20251022_203717-18s2z8v7/logs[0m + +======================================== +Training completed successfully +End Time: Wed Oct 22 08:37:55 PM EDT 2025 +======================================== + +======================================== +STAGE 2: Merging/Exporting Model +Start Time: Wed Oct 22 08:37:55 PM EDT 2025 +======================================== +Looking for checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints +Analyzing checkpoints to find the one from current training run... + - checkpoint-100: trainer_state.json modified at Wed Oct 22 08:37:52 PM EDT 2025 + - checkpoint-50: trainer_state.json modified at Wed Oct 22 08:37:36 PM EDT 2025 + +Selected checkpoint: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 +This checkpoint has the most recently updated trainer_state.json +Checkpoint details: + Path: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 + Last modified: 2025-10-22 16:54:17.414188691 -0400 + Training step: 100 +Updating merge config to point to checkpoint... +Successfully updated merge config + 2025 +======================================== + +======================================== +Cleaning up LlamaFactory processes +======================================== +Cleaned up processes on gl065.hpc.nyu.edu +Process cleanup complete +Updated merge config to use: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 + +Merge config contents: + model_name_or_path: Qwen/Qwen2.5-0.5B + finetuning_type: lora + trust_remote_code: true + adapter_name_or_path: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 + template: default + export_dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged + +Executing command: llamafactory-cli export /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. + warnings.warn( +/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. + import pkg_resources +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file added_tokens.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file special_tokens_map.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:04,914 >> loading file chat_template.jinja from cache at None +[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:38:05,086 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[INFO|configuration_utils.py:765] 2025-10-22 20:38:05,254 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:38:05,257 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,328 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,328 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,328 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,328 >> loading file added_tokens.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,328 >> loading file special_tokens_map.json from cache at None +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,328 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json +[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:38:05,329 >> loading file chat_template.jinja from cache at None +[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:38:05,495 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[INFO|configuration_utils.py:765] 2025-10-22 20:38:05,543 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json +[INFO|configuration_utils.py:839] 2025-10-22 20:38:05,543 >> Model config Qwen2Config { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "dtype": "bfloat16", + "eos_token_id": 151643, + "hidden_act": "silu", + "hidden_size": 896, + "initializer_range": 0.02, + "intermediate_size": 4864, + "layer_types": [ + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention", + "full_attention" + ], + "max_position_embeddings": 32768, + "max_window_layers": 24, + "model_type": "qwen2", + "num_attention_heads": 14, + "num_hidden_layers": 24, + "num_key_value_heads": 2, + "rms_norm_eps": 1e-06, + "rope_scaling": null, + "rope_theta": 1000000.0, + "sliding_window": null, + "tie_word_embeddings": true, + "transformers_version": "4.57.1", + "use_cache": true, + "use_mrope": false, + "use_sliding_window": false, + "vocab_size": 151936 +} + +[WARNING|logging.py:328] 2025-10-22 20:38:05,543 >> `torch_dtype` is deprecated! Use `dtype` instead! +[INFO|2025-10-22 20:38:05] llamafactory.model.model_utils.kv_cache:143 >> KV cache is enabled for faster generation. +[WARNING|logging.py:328] 2025-10-22 20:38:05,883 >> `torch_dtype` is deprecated! Use `dtype` instead! +[INFO|modeling_utils.py:1172] 2025-10-22 20:38:05,884 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors +[INFO|modeling_utils.py:2341] 2025-10-22 20:38:05,885 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. +[INFO|configuration_utils.py:986] 2025-10-22 20:38:05,886 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151643 +} + +[INFO|configuration_utils.py:941] 2025-10-22 20:38:05,966 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json +[INFO|configuration_utils.py:986] 2025-10-22 20:38:05,966 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151643, + "max_new_tokens": 2048 +} + +[INFO|dynamic_module_utils.py:423] 2025-10-22 20:38:05,997 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B. +[INFO|2025-10-22 20:38:06] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. +[INFO|2025-10-22 20:38:06] llamafactory.model.adapter:143 >> Merged 1 adapter(s). +[INFO|2025-10-22 20:38:06] llamafactory.model.adapter:143 >> Loaded adapter(s): /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100 +[INFO|2025-10-22 20:38:06] llamafactory.model.loader:143 >> all params: 494,032,768 +[INFO|2025-10-22 20:38:06] llamafactory.train.tuner:143 >> Convert model dtype to: torch.bfloat16. +[INFO|configuration_utils.py:491] 2025-10-22 20:38:06,773 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/config.json +[INFO|configuration_utils.py:757] 2025-10-22 20:38:06,777 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/generation_config.json +[INFO|modeling_utils.py:4181] 2025-10-22 20:38:08,545 >> Model weights saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/model.safetensors +[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:38:08,550 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/chat_template.jinja +[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:38:08,554 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/tokenizer_config.json +[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:38:08,558 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/special_tokens_map.json +[INFO|2025-10-22 20:38:08] llamafactory.train.tuner:143 >> Ollama modelfile saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/Modelfile + +======================================== +Merge/Export completed successfully +End Time: Wed Oct 22 08:38:09 PM EDT 2025 +======================================== + +======================================== +Preparing Training Artifacts +======================================== +Copying configuration files... +Copying and cleaning training logs...