diff --git "a/training_artifacts/logs/pipeline_cleaned.txt" "b/training_artifacts/logs/pipeline_cleaned.txt"
--- "a/training_artifacts/logs/pipeline_cleaned.txt"
+++ "b/training_artifacts/logs/pipeline_cleaned.txt"
@@ -6516,7 +6516,24 @@ World size: 2
 Launching on worker node 1: gl065
 All worker nodes launched successfully
 Waiting 5 seconds for worker nodes to initialize...
-ch_test__interactive/merged
+========================================
+Job Name: lf_torch_test__interactive
+Hostname: gl065.hpc.nyu.edu
+Number of nodes: 2
+GPUs per node: 2
+Start Time: Wed Oct 22 05:19:01 PM EDT 2025
+Log file: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/logs/pipeline.log
+========================================
+Sourcing secrets from: /scratch/zrs2020/LlamaFactoryHelper/secrets.env
+
+========================================
+Configuration Paths
+========================================
+Train Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
+Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml
+Dataset Info: 
+Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
 HF Repo ID: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
 
 
@@ -6527,7 +6544,6 @@ Start Time: Wed Oct 22 05:19:03 PM EDT 2025
 Worker node - skipping dataset preprocessing
 Dataset memory mapping: DISABLED (worker node)
 Worker node waiting for master to complete preprocessing...
-Master node (this node) will now join training as rank 0
 
 Proceeding with training...
 
@@ -6535,13 +6551,14 @@ Multi-node training detected
 Nodes: 2, GPUs per node: 2
 Master address: gl064
 Master port: 29500
-Node rank: 0
+Node rank: 1
 World size: 2
 CUDA_VISIBLE_DEVICES: 0,1
-Using node-specific cache: /scratch/zrs2020/.cache/hf_cache/home/datasets_node0
+Using node-specific cache: /scratch/zrs2020/.cache/hf_cache/home/datasets_node1
 LLaMA-Factory path: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory
 Training config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
 
+Worker node: Adding extra 5 second delay for cache synchronization...
 Starting distributed training with torch.distributed.run...
 
 *****************************************
@@ -6551,11 +6568,7 @@ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default
   warnings.warn(
 /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
   warnings.warn(
-/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
-  import pkg_resources
-/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
-  import pkg_resources
-[INFO|2025-10-22 17:19:31] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
+et `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
 [INFO|2025-10-22 17:19:31] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16
 [INFO|2025-10-22 17:19:31] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16
 [INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:31,461 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
@@ -6632,55 +6645,91 @@ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default
 [INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:31,923 >> loading file chat_template.jinja from cache at None
 [INFO|tokenization_utils_base.py:2364] 2025-10-22 17:19:32,090 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
 [INFO|2025-10-22 17:19:32] llamafactory.data.loader:143 >> Loading dataset TAUR-dev/D-SFT_C-sft_exp_AT_pvv2__fixed-sft-data...
-Generating train split:   0%|          | 0/54000 [00:00<?, ? examples/s]Generating train split:  17%|        | 9000/54000 [00:00<00:00, 73197.32 examples/s]Generating train split:  41%|      | 22000/54000 [00:00<00:00, 93704.31 examples/s]Generating train split:  69%|   | 37000/54000 [00:00<00:00, 105479.54 examples/s]Generating train split:  96%|| 52000/54000 [00:00<00:00, 116168.82 examples/s]Generating train split: 100%|| 54000/54000 [00:01<00:00, 44716.68 examples/s] 
-Converting format of dataset:   0%|          | 0/54000 [00:00<?, ? examples/s]Converting format of dataset:   4%|         | 2215/54000 [00:00<00:02, 19914.39 examples/s]Converting format of dataset:   9%|         | 4721/54000 [00:00<00:02, 22804.08 examples/s]Converting format of dataset:  15%|        | 8188/54000 [00:00<00:02, 22273.20 examples/s]Converting format of dataset:  20%|        | 10692/54000 [00:00<00:01, 23197.97 examples/s]Converting format of dataset:  26%|       | 14235/54000 [00:00<00:01, 23287.55 examples/s]Converting format of dataset:  31%|       | 16742/54000 [00:00<00:01, 23791.73 examples/s]Converting format of dataset:  38%|      | 20257/54000 [00:01<00:01, 17747.21 examples/s]Converting format of dataset:  42%|     | 22650/54000 [00:01<00:01, 19061.64 examples/s]Converting format of dataset:  46%|     | 25000/54000 [00:01<00:01, 19839.69 examples/s]Converting format of dataset:  51%|     | 27501/54000 [00:01<00:01, 21116.48 examples/s]Converting format of dataset:  57%|    | 31000/54000 [00:01<00:01, 21807.51 examples/s]Converting format of dataset:  62%|   | 33425/54000 [00:01<00:00, 22411.55 examples/s]Converting format of dataset:  67%|   | 35989/54000 [00:01<00:00, 23252.58 examples/s]Converting format of dataset:  73%|  | 39516/54000 [00:01<00:00, 23335.80 examples/s]Converting format of dataset:  78%|  | 41974/54000 [00:01<00:00, 23650.19 examples/s]Converting format of dataset:  84%| | 45484/54000 [00:02<00:00, 23413.82 examples/s]Converting format of dataset:  89%| | 48000/54000 [00:02<00:00, 23431.66 examples/s]Converting format of dataset:  94%|| 50527/54000 [00:02<00:00, 23907.95 examples/s]Converting format of dataset:  98%|| 53000/54000 [00:02<00:00, 23680.75 examples/s]Converting format of dataset: 100%|| 54000/54000 [00:03<00:00, 16615.27 examples/s]
-/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4876: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
-  warnings.warn(  # warn only once
-[rank0]:[W1022 17:19:37.282625231 ProcessGroupNCCL.cpp:5068] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
-gl064:2408238:2408238 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
-gl064:2408238:2408238 [0] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0>
-gl064:2408238:2408238 [0] NCCL INFO cudaDriverVersion 13000
-gl064:2408238:2408238 [0] NCCL INFO NCCL version 2.27.5+cuda12.9
-gl064:2408238:2408238 [0] NCCL INFO Comm config Blocking set to 1
-gl064:2408239:2408239 [1] NCCL INFO cudaDriverVersion 13000
-gl064:2408239:2408239 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
-gl064:2408239:2408239 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0>
-gl064:2408239:2408239 [1] NCCL INFO NCCL version 2.27.5+cuda12.9
-gl064:2408239:2408239 [1] NCCL INFO Comm config Blocking set to 1
-gl064:2408238:2408286 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. 
-gl064:2408238:2408286 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
-gl064:2408239:2408287 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. 
-gl064:2408239:2408287 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
-gl064:2408238:2408286 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
-gl064:2408238:2408286 [0] NCCL INFO NCCL_IB_HCA set to mlx5
-gl064:2408239:2408287 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
-gl064:2408239:2408287 [1] NCCL INFO NCCL_IB_HCA set to mlx5
-gl064:2408238:2408286 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0>
-gl064:2408238:2408286 [0] NCCL INFO Initialized NET plugin IB
-gl064:2408239:2408287 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0>
-gl064:2408239:2408287 [1] NCCL INFO Initialized NET plugin IB
-gl064:2408238:2408286 [0] NCCL INFO Assigned NET plugin IB to comm
-gl064:2408239:2408287 [1] NCCL INFO Assigned NET plugin IB to comm
-gl064:2408238:2408286 [0] NCCL INFO Using network IB
-gl064:2408239:2408287 [1] NCCL INFO Using network IB
-gl064:2408239:2408287 [1] NCCL INFO ncclCommInitRankConfig comm 0x15caa6f0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb0043625b437bfc0 - Init START
-gl064:2408238:2408286 [0] NCCL INFO ncclCommInitRankConfig comm 0x11760700 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb0043625b437bfc0 - Init START
-gl064:2408238:2408286 [0] NCCL INFO RAS client listening socket at ::1<28028>
-gl064:2408239:2408287 [1] NCCL INFO RAS client listening socket at ::1<28028>
-gl064:2408239:2408287 [1] NCCL INFO Bootstrap timings total 0.863397 (create 0.000024, send 0.000081, recv 0.860273, ring 0.002451, delay 0.000000)
-gl064:2408238:2408286 [0] NCCL INFO Bootstrap timings total 0.863394 (create 0.000023, send 0.000074, recv 0.000414, ring 0.831440, delay 0.000000)
-gl064:2408238:2408286 [0] NCCL INFO Setting affinity for GPU 0 to 0-31
-gl064:2408239:2408287 [1] NCCL INFO Setting affinity for GPU 1 to 0-31
-gl064:2408239:2408287 [1] NCCL INFO comm 0x15caa6f0 rank 1 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0
-gl064:2408238:2408286 [0] NCCL INFO comm 0x11760700 rank 0 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0
-gl064:2408238:2408286 [0] NCCL INFO Channel 00/02 : 0 1 2 3
-gl064:2408238:2408286 [0] NCCL INFO Channel 01/02 : 0 1 2 3
-gl064:2408239:2408287 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
-gl064:2408238:2408286 [0] NCCL INFO Trees [0] 1/2/-1->0->-1 [1] 1/-1/-1->0->2
-gl064:2408239:2408287 [1] NCCL INFO P2P Chunksize set to 131072
-gl064:2408238:2408286 [0] NCCL INFO P2P Chunksize set to 131072
-gl064:2408239:2408287 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
-gl064:2408238:2408286 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
+Generating train split:   0%|          | 0/54000 [00:00<?, ? examples/s]Generating train split:  17%|        | 9000/54000 [00:00<00:00, 73197.32 examples/s]Generating train split:  41%|      | 22000/54000 [00:00<00:00, 93704.31 examples/s]Generating train split:  69%|   | 37000/54000 [00:00<00:00, 105479.54 examples/s]Generating train split:  96%|ools<81.
+  import pkg_resources
+[INFO|2025-10-22 17:19:33] llamafactory.hparams.parser:423 >> Process rank: 3, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16
+[INFO|2025-10-22 17:19:33] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
+[INFO|2025-10-22 17:19:33] llamafactory.hparams.parser:423 >> Process rank: 2, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,318 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 17:19:33,486 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 17:19:33,694 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:19:33,696 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:19:33,764 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 17:19:33,928 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|2025-10-22 17:19:33] llamafactory.data.loader:143 >> Loading dataset TAUR-dev/D-SFT_C-sft_exp_AT_pvv2__fixed-sft-data...
+Generating train split:   0%|          | 0/54000 [00:00<?, ? examples/s]Generating train split:  17%|        | 9000/54000 [00:00<00:00, 73075.88 examples/s]Generating train split:  41%|      | 22000/54000 [00:00<00:00, 89697.22 examples/s]Generating train split:  69%|   | 37000/54000 [00:00<00:00, 101554.59 examples/s]Generating train split:  96%|| 52000/54000 [00:00<00:00, 112220.14 examples/s]Generating train split: 100%|| 54000/54000 [00:00<00:00, 69444.50 examples/s] 
+Converting format of dataset:   0%|          | 0/54000 [00:00<?, ? examples/s]Converting format of dataset:   4%|         | 2221/54000 [00:00<00:02, 20128.54 examples/s]Converting format of dataset:   9%|         | 4798/54000 [00:00<00:02, 23292.85 examples/s]Converting format of dataset:  13%|        | 7168/54000 [00:00<00:02, 22826.09 examples/s]Converting format of dataset:  18%|        | 9694/54000 [00:00<00:01, 23739.45 examples/s]Converting format of dataset:  23%|       | 12334/54000 [00:00<00:01, 23863.70 examples/s]Converting format of dataset:  28%|       | 15000/54000 [00:00<00:01, 24085.29 examples/s]Converting format of dataset:  33%|      | 17606/54000 [00:00<00:01, 24701.61 examples/s]Converting format of dataset:  38%|      | 20390/54000 [00:00<00:01, 18140.28 examples/s]Converting format of dataset:  43%|     | 23000/54000 [00:01<00:01, 19840.59 examples/s]Converting format of dataset:  48%|     | 25721/54000 [00:01<00:01, 21693.28 examples/s]Converting format of dataset:  52%|    | 28311/54000 [00:01<00:01, 22287.23 examples/s]Converting format of dataset:  57%|    | 30971/54000 [00:01<00:00, 23447.93 examples/s]Converting format of dataset:  64%|   | 34705/54000 [00:01<00:00, 23937.70 examples/s]Converting format of dataset:  69%|   | 37271/54000 [00:01<00:00, 24133.64 examples/s]Converting format of dataset:  74%|  | 40000/54000 [00:01<00:00, 24419.66 examples/s]Converting format of dataset:  79%|  | 42698/54000 [00:01<00:00, 25113.44 examples/s]Converting format of dataset:  86%| | 46395/54000 [00:01<00:00, 24870.95 examples/s]Converting format of dataset:  91%| | 48951/54000 [00:02<00:00, 25045.81 examples/s]Converting format of dataset:  98%|| 52696/54000 [00:02<00:00, 25010.22 examples/s]gl065:3809763:3809763 [1] NCCL INFO cudaDriverVersion 13000
+gl065:3809763:3809763 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl065:3809763:3809763 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.1<0>
+gl065:3809763:3809763 [1] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl065:3809763:3809763 [1] NCCL INFO-profiler.so. 
 gl064:2408238:2408286 [0] NCCL INFO Check P2P Type isAllDirectP2p 0 directMode 0
 gl064:2408238:2408299 [0] NCCL INFO [Proxy Service] Device 0 CPU core 6
 gl064:2408239:2408300 [1] NCCL INFO [Proxy Service] Device 1 CPU core 14
@@ -6707,7 +6756,11 @@ gl064:2408239:2408304 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [send] via NET/I
 gl064:2408239:2408306 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 16
 gl064:2408238:2408303 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
 gl064:2408239:2408304 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
-Running tokenizer on dataset:   0%|          | 0/54000 [00:00<?, ? examples/s]Running tokenizer on dataset:   2%|         | 1000/54000 [00:02<02:21, 375.88 examples/s]Running tokenizer on dataset:   4%|         | 2000/54000 [00:05<02:17, 379.55 examples/s]Running tokenizer on dataset:   6%|         | 3000/54000 [00:07<02:15, 375.93 examples/s]Running tokenizer on dataset:   7%|         | 4000/54000 [00:10<02:10, 382.03 examples/s]Running tokenizer on dataset:   9%|         | 5000/54000 [00:13<02:10, 375.79 examples/s]Running tokenizer on dataset:  11%|         | 6000/54000 [00:15<02:05, 381.05 examples/s]Running tokenizer on dataset:  13%|        | 7000/54000 [00:18<02:03, 379.13 examples/s]Running tokenizer on dataset:  15%|        | 8000/54000 [00:21<02:02, 376.07 examples/s]Running tokenizer on dataset:  17%|        | 9000/54000 [00:23<01:59, 376.38 examples/s]Running tokenizer on dataset:  19%|        | 10000/54000 [00:26<01:56, 376.60 examples/s]nels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+Running tokenizer on dataset:   0%|          | 0/54000 [00:00<?, ? examples/s]Running tokenizer on dataset:   2%|         | 1000/54000 [00:02<02:21, 375.88 examples/s]Running tokenizer on dataset:   4%|         | 2000/54000 [00:05<02:17, 379.55 examples/s]Running tokenizer on dataset:   6%|         | 3000/54000 [00:07<02:15, 375.93 examples/s]Running tokenizer on dataset:   7%|         | 4000/54000 [00:10<02:10, 382.03 examples/s]Running tokenizer on dataset:   9%|         | 5000/54000 [00:13<02:10, 375.79 examples/s]Running tokenizer on dataset:  11%|         | 6000/54000 [00:15<02:05, 381.05 examples/s]Running tokenizer on dataset:  13%|        | 7000/54000 [00:18<02:03, 379.13 examples/s]Running tokenizer on dataset:  15%|        | 8000/54000 [00:21<02:02, 376.07 examples/s]Running tokenizer on dataset:  17%|        | 9000/54000 [00:23<01:59, 376.38 examples/s]Running tokenizer on dataset:  19%|        | 10000/54000 [00:26<01:56, 376.60 examples/s]0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 0
+gl065:3809763:3809936 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl065:3809763:3809936 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl065:3809762:3809938 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl065:3809762:3809938 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
 gl065:3809763:3809936 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
 gl065:3809763:3809936 [1] NCCL INFO ncclCommInitRankConfig comm 0x126f5330 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb0043625b437bfc0 - Init COMPLETE
 gl065:3809763:3809936 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 3 nranks 4 total 0.98 (kernels 0.08, alloc 0.01, bootstrap 0.84, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00)
@@ -6724,7 +6777,7 @@ gl065:3809763:3809947 [1] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [send] via NET/I
 gl065:3809763:3809949 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 8
 gl065:3809763:3809947 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
 gl065:3809762:3809946 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
-Running tokenizer on dataset:   0%|          | 0/54000 [00:00<?, ? examples/s]Running tokenizer on dataset:   2%|         | 1000/54000 [00:02<02:17, 386.27 examples/s]Running tokenizer on dataset:   4%|         | 2000/54000 [00:05<02:14, 385.72 examples/s]Running tokenizer on dataset:   6%|         | 3000/54000 [00:07<02:13, 381.02 examples/s]Running tokenizer on dataset:   7%|         | 4000/54000 [00:10<02:09, 387.18 examples/s]Running tokenizer on dataset:   9%|         | 5000/54000 [00:13<02:09, 379.41 examples/s]Running tokenizer on dataset:  11%|         | 6000/54000 [00:15<02:04, 384.64 examples/s]Running tokenizer on dataset:  13%|        | 7000/54000 [00:18<02:02, 382.76 examples/s]Running tokenizer on dataset:  15%|        | 8000/54000 [00:20<02:01, 379.25 examples/s]Running tokenizer on dataset:  17%|        | 9000/54000 [00:23<01:58, 379.22 examples/s]Running tokenizer on dataset:  20%|        | 11000/54000 [00:29<01:53, 377.37 examples/s]Running tokenizer on dataset:  22%|       | 12000/54000 [00:31<01:51, 375.95 examples/s]Running tokenizer on dataset:  24%|       | 13000/54000 [00:34<01:48, 377.86 examples/s]Running tokenizer on dataset:  26%|       | 14000/54000 [00:37<01:45, 379.70 examples/s]Running tokenizer on dataset:  28%|       | 15000/54000 [00:39<01:43, 377.74 examples/s]Running tokenizer on dataset:  30%|       | 16000/54000 [00:42<01:39, 382.13 examples/s]Running tokenizer on dataset:  31%|      | 17000/54000 [00:44<01:37, 379.63 examples/s]Running tokenizer on dataset:  33%|      | 18000/54000 [00:47<01:35, 378.90 examples/s]Running tokenizer on dataset:  35%|      | 19000/54000 [00:50<01:33, 376.09 examples/s]Running tokenizer on dataset:  37%|      | 20000/54000 [00:52<01:29, 379.11 examples/s]Running tokenizer on dataset:  39%|      | 21000/54000 [00:55<01:27, 377.99 examples/s]Running tokenizer on dataset:  41%|      | 22000/54000 [00:58<01:24, 376.56 examples/s]Running tokenizer on dataset:  43%|     | 23000/54000 [01:00<01:21, 379.95 examples/s]kenizer on dataset:  43%|     | 23000/54000 [01:00<01:21, 382.41 examples/s]Running tokenizer on dataset:  44%|     | 24000/54000 [01:03<01:20, 370.57 examples/s]Running tokenizer on dataset:  46%|     | 25000/54000 [01:06<01:17, 375.72 examples/s]Running tokenizer on dataset:  48%|     | 26000/54000 [01:08<01:14, 376.82 examples/s]Running tokenizer on dataset:  50%|     | 27000/54000 [01:11<01:12, 374.97 examples/s]Running tokenizer on dataset:  52%|    | 28000/54000 [01:14<01:08, 377.33 examples/s]Running tokenizer on dataset:  54%|    | 29000/54000 [01:16<01:06, 377.24 examples/s]Running tokenizer on dataset:  56%|    | 30000/54000 [01:19<01:03, 379.26 examples/s]Running tokenizer on dataset:  57%|    | 31000/54000 [01:22<01:00, 377.25 examples/s]Running tokenizer on dataset:  59%|    | 32000/54000 [01:24<00:58, 377.53 examples/s]Running tokenizer on dataset:  61%|    | 33000/54000 [01:27<00:55, 377.44 examples/s]Running tokenizer on dataset:  63%|   | 34000/54000 [01:29<00:52, 381.86 examples/s]Running tokenizer on dataset:  65%|   | 35000/54000 [01:32<00:49, 380.17 examples/s]Running tokenizer on dataset:  67%|   | 36000/54000 [01:35<00:47, 381.74 examples/s]Running tokenizer on dataset:  69%|   | 37000/54000 [01:37<00:44, 382.60 examples/s]Running tokenizer on dataset:  70%|   | 38000/54000 [01:40<00:41, 384.02 examples/s]Running tokenizer on dataset:  72%|  | 39000/54000 [01:43<00:39, 378.74 examples/s]Running tokenizer on dataset:  74%|  | 40000/54000 [01:45<00:36, 380.41 examples/s]Running tokenizer on dataset:  76%|  | 41000/54000 [01:48<00:34, 380.81 examples/s]Running tokenizer on dataset:  78%|  | 42000/54000 [01:50<00:31, 378.76 examples/s]Running tokenizer on dataset:  80%|  | 43000/54000 [01:53<00:29, 376.03 examples/s]Running tokenizer on dataset:  81%| | 44000/54000 [01:56<00:26, 375.10 examples/s]Running tokenizer on dataset:  83%| | 45000/54000 [01:58<00:23, 377.51 examples/s]Running tokenizer on dataset:  85%| | 46000/54000 [02:01<00:21, 379.87 examples/s]Running tokenizer on dataset:  87%| | 47000/54000 [02:04<00:18, 381.02 examples/s]Running tokenizer on dataset:  89%| | 48000/54000 [02:06<00:15, 379.52 examples/s]Running tokenizer on dataset:  91%| | 49000/54000 [02:09<00:13, 378.26 examples/s]Running tokenizer on dataset:  93%|| 50000/54000 [02:12<00:10, 381.71 examples/s]Running tokenizer on dataset:  94%|| 51000/54000 [02:14<00:07, 381.39 examples/s]Running tokenizer on dataset:  96%|| 52000/54000 [02:17<00:05, 379.22 examples/s]Running tokenizer on dataset:  98%|| 53000/54000 [02:20<00:02, 376.74 examples/s]Running tokenizer on dataset: 100%|| 54000/54000 [02:22<00:00, 375.13 examples/s]Running tokenizer on dataset: 100%|| 54000/54000 [02:23<00:00, 377.41 examples/s]
+Running tokenizer on dataset:   0%|          | 0/54000 [00:00<?, ? examples/s]Running tokenizer on dataset:   2%|         | 1000/54000 [00:02<02:17, 386.27 examples/s]Running tokenizer on dataset:   4%|         | 2000/54000 [00:05<02:14, 385.72 examples/s]Running tokenizer on dataset:   6%|         | 3000/54000 [00:07<02:13, 381.02 examples/s]Running tokenizer on dataset:   7%|         | 4000/54000 [00:10<02:09, 387.18 examples/s]Running tokenizer on dataset:   9%|         | 5000/54000 [00:13<02:09, 379.41 examples/s]Running tokenizer on dataset:  11%|         | 6000/54000 [00:15<02:04, 384.64 examples/s]Running tokenizer on dataset:  13%|        | 7000/54000 [00:18<02:02, 382.76 examples/s]Running tokenizer on dataset:  15%|        | 8000/54000 [00:20<02:01, 379.25 examples/s]Running tokenizer on dataset:  17%|        | 9000/54000 [00:23<01:58, 379.22 examples/s]Running tokenizer on dataset:  20%|        | 11000/54000 [00:29<01:53, 377.37 examples/s]Running tokenizer on dataset:  22%|       | 12000/54000 [00:31<01:51, 375.95 examples/s]Running tokenizer on dataset:  24%|       | 13000/54000 [00:34<01:48, 377.86 examples/s]Running tokenizer on dataset:  26%|       | 14000/54000 [00:37<01:45, 379.70 examples/s]Running tokenizer on dataset:  28%|       | 15000/54000 [00:39<01:43, 377.74 examples/s]Running tokenizer on dataset:  30%|       | 16000/54000 [00:42<01:39, 382.13 examples/s]Running tokenizer on dataset:  31%|      | 17000/54000 [00:44<01:37, 379.63 examples/s]Running tokenizer on dataset:  33%|      | 18000/54000 [00:47<01:35, 378.90 examples/s]Running tokenizer on dataset:  35%|      | 19000/54000 [00:50<01:33, 376.09 examples/s]Running tokenizer on dataset:  37%|      | 20000/54000 [00:52<01:29, 379.11 examples/s]Running tokenizer on dataset:  39%|      | 21000/54000 [00:55<01:27, 377.99 examples/s]Running tokenizer on dataset:  41%|      | 22000/54000 [00:58<01:24, 376.56 examples/s]Running tokenizer on dataset:  43%|     | 23000/54000 [01:00<01:21, 379.95 examples/s]ng tokenizer on dataset:  43%|     | 23000/54000 [01:00<01:21, 382.41 examples/s]Running tokenizer on dataset:  44%|     | 24000/54000 [01:03<01:20, 373.05 examples/s]Running tokenizer on dataset:  46%|     | 25000/54000 [01:05<01:16, 377.81 examples/s]Running tokenizer on dataset:  48%|     | 26000/54000 [01:08<01:13, 379.40 examples/s]Running tokenizer on dataset:  50%|     | 27000/54000 [01:10<01:11, 378.24 examples/s]Running tokenizer on dataset:  52%|    | 28000/54000 [01:13<01:08, 379.91 examples/s]Running tokenizer on dataset:  54%|    | 29000/54000 [01:16<01:05, 379.69 examples/s]Running tokenizer on dataset:  56%|    | 30000/54000 [01:18<01:02, 383.15 examples/s]Running tokenizer on dataset:  57%|    | 31000/54000 [01:21<01:00, 380.58 examples/s]Running tokenizer on dataset:  59%|    | 32000/54000 [01:24<00:57, 380.59 examples/s]Running tokenizer on dataset:  61%|    | 33000/54000 [01:26<00:55, 380.43 examples/s]Running tokenizer on dataset:  63%|   | 34000/54000 [01:29<00:51, 385.22 examples/s]Running tokenizer on dataset:  65%|   | 35000/54000 [01:31<00:49, 383.04 examples/s]Running tokenizer on dataset:  67%|   | 36000/54000 [01:34<00:46, 384.95 examples/s]Running tokenizer on dataset:  69%|   | 37000/54000 [01:37<00:44, 385.01 examples/s]Running tokenizer on dataset:  70%|   | 38000/54000 [01:39<00:41, 386.63 examples/s]Running tokenizer on dataset:  72%|  | 39000/54000 [01:42<00:39, 382.44 examples/s]Running tokenizer on dataset:  74%|  | 40000/54000 [01:44<00:36, 384.00 examples/s]Running tokenizer on dataset:  76%|  | 41000/54000 [01:47<00:34, 373.91 examples/s]Running tokenizer on dataset:  78%|  | 42000/54000 [01:50<00:31, 375.19 examples/s]Running tokenizer on dataset:  80%|  | 43000/54000 [01:53<00:29, 374.64 examples/s]Running tokenizer on dataset:  81%| | 44000/54000 [01:55<00:26, 375.40 examples/s]Running tokenizer on dataset:  83%| | 45000/54000 [01:58<00:23, 378.75 examples/s]Running tokenizer on dataset:  85%| | 46000/54000 [02:00<00:20, 381.94 examples/s]Running tokenizer on dataset:  87%| | 47000/54000 [02:03<00:18, 383.20 examples/s]Running tokenizer on dataset:  89%| | 48000/54000 [02:06<00:15, 382.16 examples/s]Running tokenizer on dataset:  91%| | 49000/54000 [02:08<00:13, 380.50 examples/s]Running tokenizer on dataset:  93%|| 50000/54000 [02:11<00:10, 384.42 examples/s]Running tokenizer on dataset:  94%|| 51000/54000 [02:13<00:07, 384.05 examples/s]Running tokenizer on dataset:  96%|| 52000/54000 [02:16<00:05, 382.13 examples/s]Running tokenizer on dataset:  98%|| 53000/54000 [02:19<00:02, 379.45 examples/s]Running tokenizer on dataset: 100%|| 54000/54000 [02:21<00:00, 377.87 examples/s]Running tokenizer on dataset: 100%|| 54000/54000 [02:22<00:00, 380.07 examples/s]
 training example:
 input_ids:
 [33975, 25, 21806, 279, 2701, 3491, 13, 81917, 697, 32711, 3019, 553, 3019, 13, 3197, 498, 525, 8060, 11, 2968, 697, 4226, 304, 419, 3561, 25, 366, 9217, 2235, 21732, 4226, 12533, 9217, 94367, 2, 22079, 198, 16429, 279, 5109, 304, 279, 1140, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 1125, 1855, 458, 23606, 429, 16819, 220, 16, 21, 24, 13, 1446, 646, 990, 6770, 34784, 7525, 17973, 11, 85922, 11777, 608, 8, 323, 1817, 1372, 646, 1172, 387, 1483, 3055, 13, 4615, 6291, 1265, 2924, 264, 4013, 315, 7354, 330, 8304, 1599, 2974, 1380, 1817, 3019, 374, 264, 35972, 5666, 323, 279, 1590, 3019, 13653, 11508, 311, 279, 2169, 1372, 476, 432, 1265, 387, 264, 3175, 23606, 429, 3059, 304, 279, 2169, 382, 35127, 697, 4226, 304, 279, 2701, 3561, 510, 27, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 1339, 9064, 11993, 21732, 4226, 9940, 374, 279, 1140, 315, 7354, 311, 5545, 279, 2169, 1372, 476, 432, 1265, 387, 264, 3175, 23606, 429, 3059, 304, 279, 2169, 13, 4710, 2461, 3110, 510, 2679, 279, 1140, 315, 5109, 572, 508, 16, 11, 220, 17, 11, 220, 18, 60, 323, 279, 2169, 572, 220, 16, 11, 498, 1410, 3270, 510, 27, 9217, 397, 8304, 220, 16, 25, 220, 16, 488, 220, 17, 284, 220, 18, 198, 8304, 220, 17, 25, 220, 18, 608, 220, 18, 284, 220, 16, 198, 522, 9217, 1339, 269, 4710, 27, 9217, 397, 7, 16, 488, 220, 17, 8, 608, 220, 18, 198, 522, 9217, 1339, 10061, 594, 1744, 3019, 553, 3019, 13, 151643, 198, 71703, 25, 13708, 766, 397, 27, 13611, 397, 5338, 11, 358, 3278, 1430, 311, 3535, 279, 3491, 2664, 553, 4378, 700, 264, 3119, 323, 728, 2167, 5538, 1119, 7716, 911, 1246, 358, 1265, 11625, 419, 624, 8304, 220, 16, 25, 64547, 279, 5795, 481, 1205, 1184, 311, 1855, 458, 23606, 1667, 279, 5109, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 429, 16819, 220, 16, 21, 24, 624, 8304, 220, 17, 25, 21144, 3204, 7525, 481, 78646, 11, 75240, 11, 46444, 11, 323, 12804, 624, 8304, 220, 18, 25, 9735, 2155, 27459, 315, 1493, 7525, 448, 279, 2661, 5109, 624, 8304, 220, 19, 25, 3393, 5257, 37906, 3080, 582, 1477, 825, 429, 16819, 220, 16, 21, 24, 624, 8304, 220, 20, 25, 9646, 582, 614, 1730, 279, 4396, 23606, 11, 3042, 432, 304, 279, 5189, 3561, 624, 6025, 2087, 1526, 678, 1493, 7354, 11, 1588, 594, 279, 1590, 23606, 429, 16819, 220, 16, 21, 24, 1447, 27, 9217, 397, 7, 21, 22, 353, 220, 17, 8, 488, 220, 22, 16, 481, 220, 18, 16, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 5338, 11, 220, 21, 22, 488, 220, 22, 16, 284, 220, 16, 18, 23, 11, 1221, 220, 16, 18, 23, 481, 220, 18, 16, 284, 220, 16, 15, 22, 13, 1096, 1558, 537, 6144, 220, 16, 21, 24, 11, 773, 279, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 10061, 752, 10146, 847, 4226, 624, 27, 13611, 397, 58465, 12153, 15846, 1447, 10253, 279, 5109, 504, 279, 2661, 738, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 311, 1352, 458, 23606, 6144, 287, 220, 16, 21, 24, 1667, 6770, 34784, 7525, 17973, 11, 85922, 11777, 608, 3593, 8304, 220, 16, 25, 71059, 220, 21, 22, 323, 220, 18, 16, 624, 8304, 220, 17, 25, 2691, 220, 22, 16, 311, 279, 1102, 504, 14822, 220, 16, 624, 8304, 220, 18, 25, 576, 1590, 1102, 686, 387, 220, 16, 21, 24, 382, 27, 9217, 397, 7, 21, 22, 353, 220, 18, 16, 8, 488, 220, 22, 16, 284, 220, 16, 21, 24, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 5338, 11, 220, 21, 22, 353, 220, 18, 16, 284, 220, 17, 11, 15, 22, 22, 11, 1221, 220, 17, 11, 15, 22, 22, 481, 220, 22, 16, 284, 220, 17, 11, 15, 15, 21, 13, 1096, 374, 537, 6144, 311, 220, 16, 21, 24, 11, 773, 279, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 1462, 311, 3395, 847, 2033, 825, 803, 882, 624, 27, 13611, 397, 5338, 11, 358, 3278, 1430, 311, 3535, 279, 3491, 2664, 553, 4378, 700, 264, 3119, 323, 728, 2167, 5538, 1119, 7716, 911, 1246, 358, 1265, 11625, 419, 624, 785, 5795, 374, 311, 1855, 458, 23606, 1667, 279, 5109, 220, 21, 22, 11, 220, 22, 16, 11, 323, 220, 18, 16, 429, 16819, 220, 16, 21, 24, 624, 40, 686, 1191, 553, 4460, 2155, 27459, 315, 1493, 2326, 5109, 323, 862, 3204, 7525, 17973, 11, 85922, 11777, 608, 568, 715, 12209, 358, 686, 15442, 1817, 10601, 323, 1779, 421, 807, 6144, 220, 16, 21, 24, 13, 1416, 537, 11, 358, 686, 3271, 389, 311, 279, 1790, 10601, 3080, 358, 1477, 825, 429, 4278, 624, 6025, 9271, 279, 4396, 10601, 11, 358, 686, 3042, 847, 6291, 304, 279, 4055, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 9877, 3561, 624, 8420, 594, 847, 11682, 3119, 1447, 16, 13, 4230, 678, 3204, 27459, 315, 220, 21, 22, 11, 220, 22, 16, 11, 323, 220, 18, 16, 1667, 5256, 11, 75240, 11, 46444, 11, 323, 12804, 624, 17, 13, 54115, 1817, 10601, 311, 1490, 421, 432, 16819, 220, 16, 21, 24, 624, 18, 13, 1416, 902, 10601, 16819, 220, 16, 21, 24, 11, 13153, 279, 1882, 448, 501, 27459, 3080, 358, 1477, 825, 429, 4278, 624, 19, 13, 9646, 358, 614, 1730, 279, 4396, 10601, 11, 3158, 432, 438, 264, 3175, 23606, 304, 279, 4055, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 9877, 3561, 382, 7039, 1077, 752, 3161, 15039, 369, 279, 4396, 10601, 4894, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 16, 13, 14822, 220, 16, 25, 220, 21, 22, 353, 220, 18, 16, 284, 220, 17, 15, 22, 22, 198, 17, 13, 14822, 220, 17, 25, 220, 17, 15, 22, 22, 481, 220, 22, 16, 284, 220, 17, 15, 15, 21, 271, 54815, 11, 279, 1590, 1102, 374, 220, 17, 15, 15, 21, 11, 892, 1558, 537, 6144, 220, 16, 21, 24, 13, 576, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 3983, 3783, 11, 1077, 752, 1744, 911, 432, 1549, 624, 27, 13611, 397, 58465, 12153, 15846, 1447, 3838, 374, 279, 1102, 979, 498, 912, 1378, 5109, 504, 279, 1140, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 3786, 1939, 8304, 220, 16, 25, 2691, 220, 21, 22, 323, 220, 18, 16, 624, 8304, 220, 17, 25, 11778, 279, 2629, 504, 14822, 220, 16, 323, 912, 220, 22, 16, 382, 19357, 84670, 25, 320, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 271, 27, 9217, 397, 7, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 25, 5512, 11, 220, 21, 22, 488, 220, 18, 16, 284, 220, 24, 23, 11, 1221, 220, 24, 23, 488, 220, 22, 16, 284, 220, 16, 21, 24, 13, 576, 4226, 374, 4396, 382, 27, 423, 8477, 397, 33092, 198, 522, 423, 8477, 397, 522, 34913, 397, 522, 26865, 10370, 39, 763, 11, 279, 4396, 4226, 374, 1447, 27, 9217, 397, 7, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 198, 522, 9217, 29, 151643, 198]
@@ -6976,8 +7029,8 @@ Hence, the correct answer is:
 (67 + 31) + 71
 </answer><|endoftext|>
 
-[INFO|configuration_utils.py:765] 2025-10-22 17:22:02,188 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
-[INFO|configuration_utils.py:839] 2025-10-22 17:22:02,188 >> Model config Qwen2Config {
+[INFO|configuration_utils.py:765] 2025-10-22 17:22:02,136 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:22:02,137 >> Model config Qwen2Config {
   "architectures": [
     "Qwen2ForCausalLM"
   ],
@@ -7035,36 +7088,35 @@ Hence, the correct answer is:
 
 [INFO|2025-10-22 17:22:02] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
 `torch_dtype` is deprecated! Use `dtype` instead!
-[WARNING|logging.py:328] 2025-10-22 17:22:02,943 >> `torch_dtype` is deprecated! Use `dtype` instead!
-[INFO|modeling_utils.py:1172] 2025-10-22 17:22:02,944 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
-[INFO|modeling_utils.py:2341] 2025-10-22 17:22:02,945 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
-[INFO|configuration_utils.py:986] 2025-10-22 17:22:02,947 >> Generate config GenerationConfig {
+[WARNING|logging.py:328] 2025-10-22 17:22:02,935 >> `torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|modeling_utils.py:1172] 2025-10-22 17:22:02,936 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
+[INFO|modeling_utils.py:2341] 2025-10-22 17:22:02,937 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
+[INFO|configuration_utils.py:986] 2025-10-22 17:22:02,938 >> Generate config GenerationConfig {
   "bos_token_id": 151643,
   "eos_token_id": 151643,
   "use_cache": false
 }
 
-[INFO|configuration_utils.py:941] 2025-10-22 17:22:03,449 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
-[INFO|configuration_utils.py:986] 2025-10-22 17:22:03,449 >> Generate config GenerationConfig {
+[INFO|configuration_utils.py:941] 2025-10-22 17:22:03,393 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
+[INFO|configuration_utils.py:986] 2025-10-22 17:22:03,393 >> Generate config GenerationConfig {
   "bos_token_id": 151643,
   "eos_token_id": 151643,
   "max_new_tokens": 2048
 }
 
-[INFO|dynamic_module_utils.py:423] 2025-10-22 17:22:03,511 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
+[INFO|dynamic_module_utils.py:423] 2025-10-22 17:22:03,456 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
 [INFO|2025-10-22 17:22:03] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
 [INFO|2025-10-22 17:22:03] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
 [INFO|2025-10-22 17:22:03] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
 [INFO|2025-10-22 17:22:03] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
-[INFO|2025-10-22 17:22:03] llamafactory.model.model_utils.misc:143 >> Found linear modules: q_proj,up_proj,down_proj,v_proj,gate_proj,k_proj,o_proj
+[INFO|2025-10-22 17:22:03] llamafactory.model.model_utils.misc:143 >> Found linear modules: q_proj,k_proj,v_proj,o_proj,up_proj,gate_proj,down_proj
 The model is already on multiple devices. Skipping the move to device specified in `args`.
 The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
 [INFO|2025-10-22 17:22:03] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826
-[WARNING|trainer.py:906] 2025-10-22 17:22:03,621 >> The model is already on multiple devices. Skipping the move to device specified in `args`.
-[INFO|trainer.py:699] 2025-10-22 17:22:03,624 >> max_steps is given, it will override any value given in num_train_epochs
-[INFO|trainer.py:749] 2025-10-22 17:22:03,624 >> Using auto half precision backend
-[WARNING|2025-10-22 17:22:03] llamafactory.train.callbacks:154 >> Previous trainer log in this folder will be deleted.
-[WARNING|trainer.py:982] 2025-10-22 17:22:03,627 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+[WARNING|trainer.py:906] 2025-10-22 17:22:03,562 >> The model is already on multiple devices. Skipping the move to device specified in `args`.
+[INFO|trainer.py:699] 2025-10-22 17:22:03,564 >> max_steps is given, it will override any value given in num_train_epochs
+[INFO|trainer.py:749] 2025-10-22 17:22:03,564 >> Using auto half precision backend
+[WARNING|trainer.py:982] 2025-10-22 17:22:03,565 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
 [INFO|trainer.py:2519] 2025-10-22 17:22:04,001 >> ***** Running training *****
 [INFO|trainer.py:2520] 2025-10-22 17:22:04,001 >>   Num examples = 48,600
 [INFO|trainer.py:2521] 2025-10-22 17:22:04,001 >>   Num Epochs = 1
@@ -7072,9 +7124,15 @@ The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and g
 [INFO|trainer.py:2525] 2025-10-22 17:22:04,001 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
 [INFO|trainer.py:2526] 2025-10-22 17:22:04,001 >>   Gradient Accumulation steps = 1
 [INFO|trainer.py:2527] 2025-10-22 17:22:04,001 >>   Total optimization steps = 100
-[INFO|trainer.py:2528] 2025-10-22 17:22:04,002 >>   Number of trainable parameters = 4,399,104
-[INFO|integration_utils.py:867] 2025-10-22 17:22:04,024 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
-wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+[INFO|trainer.py:2528] 2025-10-22 17:22:04,003 >>   Number of trainable parameters = 4,399,104
+[INFO|trainer.py:2810] 2025-10-22 17:22:30,188 >> 
+
+Training completed. Do not forget to share your model on huggingface.co/models =)
+
+
+gl065:3809763:3809763 [1] NCCL INFO comm 0x126f5330 rank 3 nranks 4 cudaDev 1 busId 59000 - Destroy COMPLETE
+gl065:3809762:3809762 [0] NCCL INFO comm 0x11644470 rank 2 nranks 4 cudaDev 0 busId 47000 - Destroy COMPLETE
+. Use `wandb login --relogin` to force relogin
 wandb: Tracking run with wandb version 0.22.2
 wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_172204-p5x5pr79
 wandb: Run `wandb offline` to turn off syncing.
@@ -7300,7 +7358,18 @@ gl064:2408238:2408238 [0] NCCL INFO comm 0x11760700 rank 0 nranks 4 cudaDev 0 bu
 Training completed successfully
 End Time: Wed Oct 22 05:22:33 PM EDT 2025
 ========================================
-0: python() [0x56cd70]
+ils.hpp:697 (most recent call first):
+frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f3255fbcb80 in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libc10.so)
+frame #1: <unknown function> + 0x5ffd531 (0x7f32983c4531 in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
+frame #2: <unknown function> + 0x5ffdacd (0x7f32983c4acd in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
+frame #3: <unknown function> + 0x5ffe19b (0x7f32983c519b in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
+frame #4: <unknown function> + 0x5fff0d7 (0x7f32983c60d7 in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
+frame #5: c10d::TCPStore::compareSet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, std::vector<unsigned char, std::allocator<unsigned char> > const&) + 0x261 (0x7f32983c0c31 in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
+frame #6: <unknown function> + 0xd9d16d (0x7f32a7b0b16d in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
+frame #7: <unknown function> + 0x3c072e (0x7f32a712e72e in /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
+frame #8: python() [0x543944]
+<omitting python frames>
+frame #10: python() [0x56cd70]
 frame #14: python() [0x6282e6]
 frame #18: python() [0x5fd997]
 frame #19: python() [0x53939e]
@@ -7523,3 +7592,1241 @@ Preparing Training Artifacts
 ========================================
 Copying configuration files...
 Copying and cleaning training logs...
+Training artifacts prepared in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/training_artifacts
+Contents:
+Log files:
+
+========================================
+STAGE 3: Uploading to HuggingFace Hub
+Repository: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+Start Time: Wed Oct 22 05:22:49 PM EDT 2025
+========================================
+Uploading contents of: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+Directory structure:
+
+Executing: huggingface-cli upload TAUR-dev/testing_llamafactory_helper_quick_test__interactive /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged .
+Start hashing 17 files.
+Finished hashing 17 files.
+[33m  Warning: 'huggingface-cli upload' is deprecated. Use 'hf upload' instead.[0m
+Processing Files (0 / 0)      : |          |  0.00B /  0.00B            
+New Data Upload               : |          |  0.00B /  0.00B            [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  10%|         |  101MB /  988MB            [A[A[A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  10%|         |  101MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  11%|         |  112MB / 1.00GB,   ???B/s  
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  22%|       |  218MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  23%|       |  229MB / 1.00GB,  588MB/s  
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  28%|       |  272MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  28%|       |  284MB / 1.00GB,  429MB/s  
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  28%|       |  272MB /  988MB            [A[A[A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  33%|      |  328MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  34%|      |  339MB / 1.00GB,  284MB/s  
+New Data Upload               :  41%|     | 55.6MB /  134MB, 69.6MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  37%|      |  368MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  38%|      |  380MB / 1.00GB,  268MB/s  
+New Data Upload               :  48%|     | 96.0MB /  201MB, 96.0MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  42%|     |  413MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  42%|     |  425MB / 1.00GB,  260MB/s  
+New Data Upload               :  70%|   |  141MB /  201MB,  117MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  48%|     |  472MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  48%|     |  484MB / 1.00GB,  265MB/s  
+New Data Upload               :  75%|  |  200MB /  268MB,  143MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  54%|    |  531MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  54%|    |  543MB / 1.00GB,  269MB/s  
+New Data Upload               :  77%|  |  259MB /  335MB,  162MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  61%|   |  606MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  62%|   |  618MB / 1.00GB,  281MB/s  
+New Data Upload               :  83%| |  334MB /  403MB,  186MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  68%|   |  668MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  68%|   |  679MB / 1.00GB,  283MB/s  
+New Data Upload               :  84%| |  395MB /  470MB,  198MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  72%|  |  710MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  72%|  |  722MB / 1.00GB,  277MB/s  
+New Data Upload               :  82%| |  438MB /  537MB,  199MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  77%|  |  756MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  77%|  |  768MB / 1.00GB,  273MB/s  
+New Data Upload               :  90%| |  484MB /  537MB,  202MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  84%| |  826MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  84%| |  837MB / 1.00GB,  279MB/s  
+New Data Upload               :  92%||  554MB /  604MB,  213MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  89%| |  880MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  89%| |  891MB / 1.00GB,  278MB/s  
+New Data Upload               :  91%| |  607MB /  671MB,  217MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors:  98%||  967MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  98%||  978MB / 1.00GB,  289MB/s  
+New Data Upload               :  97%||  695MB /  716MB,  232MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (1 / 2)      : 100%||  999MB / 1.00GB,  277MB/s  
+New Data Upload               : 100%||  715MB /  716MB,  224MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (1 / 2)      : 100%||  999MB / 1.00GB,  246MB/s  
+New Data Upload               : 100%||  716MB /  716MB,  199MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (2 / 2)      : 100%|| 1.00GB / 1.00GB,  234MB/s  
+New Data Upload               : 100%||  716MB /  716MB,  188MB/s  [A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[A
+
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+
+
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (2 / 2)      : 100%|| 1.00GB / 1.00GB,  211MB/s  
+New Data Upload               : 100%||  716MB /  716MB,  170MB/s  
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            
+  .../merged/model.safetensors: 100%||  988MB /  988MB            
+Removing 11 file(s) from commit that have not changed.
+https://huggingface.co/TAUR-dev/testing_llamafactory_helper_quick_test__interactive/tree/main/.
+
+========================================
+Upload completed successfully
+Model and training artifacts uploaded to: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+End Time: Wed Oct 22 05:22:57 PM EDT 2025
+========================================
+
+========================================
+STAGE 4: Cleanup
+========================================
+Keeping checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Keeping merged model in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+
+========================================
+PIPELINE COMPLETED SUCCESSFULLY
+End Time: Wed Oct 22 05:22:57 PM EDT 2025
+========================================
+
+========================================
+Cleaning up LlamaFactory processes
+========================================
+Cleaned up processes on gl064.hpc.nyu.edu
+Cleaning up processes on worker node: gl065
+Process cleanup complete
+========================================
+Job Name: lf_torch_test__interactive
+Hostname: gl064.hpc.nyu.edu
+Number of nodes: 2
+GPUs per node: 2
+Start Time: Wed Oct 22 05:23:04 PM EDT 2025
+Log file: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/logs/pipeline.log
+========================================
+Sourcing secrets from: /scratch/zrs2020/LlamaFactoryHelper/secrets.env
+
+========================================
+Configuration Paths
+========================================
+Train Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
+Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml
+Dataset Info: 
+Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+HF Repo ID: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+
+
+========================================
+STAGE 1: Training Model
+Start Time: Wed Oct 22 05:23:07 PM EDT 2025
+========================================
+
+Checking dataset cache status (master node only)...
+Multi-node setup detected - skipping centralized preprocessing
+Each node will process dataset independently with memory mapping disabled
+
+========================================
+Multi-Node Coordination
+========================================
+This is the master node - launching worker nodes...
+Master node: gl064
+Master port: 29500
+World size: 2
+
+Launching on worker node 1: gl065
+All worker nodes launched successfully
+Waiting 5 seconds for worker nodes to initialize...
+Master node (this node) will now join training as rank 0
+
+Proceeding with training...
+
+Multi-node training detected
+Nodes: 2, GPUs per node: 2
+Master address: gl064
+Master port: 29500
+Node rank: 0
+World size: 2
+CUDA_VISIBLE_DEVICES: 0,1
+Using node-specific cache: /scratch/zrs2020/.cache/hf_cache/home/datasets_node0
+LLaMA-Factory path: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory
+Training config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
+
+Starting distributed training with torch.distributed.run...
+
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+*****************************************
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+[INFO|2025-10-22 17:23:38] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
+[INFO|2025-10-22 17:23:38] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16
+[INFO|2025-10-22 17:23:38] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:38,584 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 17:23:38,756 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 17:23:38,948 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:23:38,949 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:23:39,011 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 17:23:39,177 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|2025-10-22 17:23:39] llamafactory.data.loader:143 >> Loading dataset TAUR-dev/D-SFT_C-sft_exp_AT_pvv2__fixed-sft-data...
+Converting format of dataset: 100%|| 54000/54000 [00:00<?, ? examples/s]Converting format of dataset: 56506 examples [00:00, 24952.40 examples/s]         Converting format of dataset: 59043 examples [00:00, 25195.44 examples/s]Converting format of dataset: 61801 examples [00:00, 26279.56 examples/s]Converting format of dataset: 65717 examples [00:00, 25622.99 examples/s]Converting format of dataset: 68428 examples [00:00, 25496.50 examples/s]Converting format of dataset: 71000 examples [00:00, 25308.97 examples/s]Converting format of dataset: 74435 examples [00:00, 18413.77 examples/s]Converting format of dataset: 77000 examples [00:01, 19909.90 examples/s]Converting format of dataset: 79911 examples [00:01, 22105.66 examples/s]Converting format of dataset: 82438 examples [00:01, 22901.93 examples/s]Converting format of dataset: 85000 examples [00:01, 23422.51 examples/s]Converting format of dataset: 87846 examples [00:01, 24801.96 examples/s]Converting format of dataset: 90442 examples [00:01, 25019.10 examples/s]Converting format of dataset: 94424 examples [00:01, 25584.61 examples/s]Converting format of dataset: 98414 examples [00:01, 25890.02 examples/s]Converting format of dataset: 102444 examples [00:02, 26079.45 examples/s]Converting format of dataset: 106373 examples [00:02, 26078.84 examples/s]Converting format of dataset: 108000 examples [00:03, 17701.41 examples/s]
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4876: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
+  warnings.warn(  # warn only once
+[rank0]:[W1022 17:23:42.608673411 ProcessGroupNCCL.cpp:5068] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
+gl064:2409682:2409682 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2409682:2409682 [0] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0>
+gl064:2409682:2409682 [0] NCCL INFO cudaDriverVersion 13000
+gl064:2409682:2409682 [0] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl064:2409682:2409682 [0] NCCL INFO Comm config Blocking set to 1
+gl064:2409683:2409683 [1] NCCL INFO cudaDriverVersion 13000
+gl064:2409683:2409683 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2409683:2409683 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0>
+gl064:2409683:2409683 [1] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl064:2409683:2409683 [1] NCCL INFO Comm config Blocking set to 1
+gl064:2409682:2409720 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. 
+gl064:2409683:2409721 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. 
+gl064:2409683:2409721 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
+gl064:2409682:2409720 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
+gl064:2409682:2409720 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2409683:2409721 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2409682:2409720 [0] NCCL INFO NCCL_IB_HCA set to mlx5
+gl064:2409683:2409721 [1] NCCL INFO NCCL_IB_HCA set to mlx5
+gl064:2409682:2409720 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0>
+gl064:2409683:2409721 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0>
+gl064:2409683:2409721 [1] NCCL INFO Initialized NET plugin IB
+gl064:2409682:2409720 [0] NCCL INFO Initialized NET plugin IB
+gl064:2409683:2409721 [1] NCCL INFO Assigned NET plugin IB to comm
+gl064:2409682:2409720 [0] NCCL INFO Assigned NET plugin IB to comm
+gl064:2409683:2409721 [1] NCCL INFO Using network IB
+gl064:2409682:2409720 [0] NCCL INFO Using network IB
+gl064:2409682:2409720 [0] NCCL INFO ncclCommInitRankConfig comm 0x13300130 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xdc96a09c11393e3 - Init START
+gl064:2409683:2409721 [1] NCCL INFO ncclCommInitRankConfig comm 0x12e235c0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xdc96a09c11393e3 - Init START
+gl064:2409682:2409720 [0] NCCL INFO RAS client listening socket at ::1<28028>
+gl064:2409683:2409721 [1] NCCL INFO RAS client listening socket at ::1<28028>
+gl064:2409683:2409721 [1] NCCL INFO Bootstrap timings total 2.873970 (create 0.000020, send 0.000193, recv 2.871749, ring 0.001218, delay 0.000000)
+gl064:2409682:2409720 [0] NCCL INFO Bootstrap timings total 2.874337 (create 0.000022, send 0.000202, recv 0.000358, ring 2.751064, delay 0.000000)
+gl064:2409683:2409721 [1] NCCL INFO Setting affinity for GPU 1 to 0-31
+gl064:2409682:2409720 [0] NCCL INFO Setting affinity for GPU 0 to 0-31
+gl064:2409683:2409721 [1] NCCL INFO comm 0x12e235c0 rank 1 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0
+gl064:2409682:2409720 [0] NCCL INFO comm 0x13300130 rank 0 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0
+gl064:2409683:2409721 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
+gl064:2409683:2409721 [1] NCCL INFO P2P Chunksize set to 131072
+gl064:2409682:2409720 [0] NCCL INFO Channel 00/02 : 0 1 2 3
+gl064:2409682:2409720 [0] NCCL INFO Channel 01/02 : 0 1 2 3
+gl064:2409682:2409720 [0] NCCL INFO Trees [0] 1/2/-1->0->-1 [1] 1/-1/-1->0->2
+gl064:2409682:2409720 [0] NCCL INFO P2P Chunksize set to 131072
+gl064:2409683:2409721 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
+gl064:2409683:2409731 [1] NCCL INFO [Proxy Service] Device 1 CPU core 6
+gl064:2409683:2409732 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 7
+gl064:2409682:2409720 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
+gl064:2409682:2409720 [0] NCCL INFO Check P2P Type isAllDirectP2p 0 directMode 0
+gl064:2409682:2409733 [0] NCCL INFO [Proxy Service] Device 0 CPU core 8
+gl064:2409683:2409721 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl064:2409683:2409721 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl064:2409682:2409734 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 9
+gl064:2409682:2409720 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl064:2409682:2409720 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl064:2409682:2409720 [0] NCCL INFO CC Off, workFifoBytes 1048576
+gl064:2409683:2409721 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl064:2409682:2409720 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl064:2409683:2409721 [1] NCCL INFO ncclCommInitRankConfig comm 0x12e235c0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xdc96a09c11393e3 - Init COMPLETE
+gl064:2409682:2409720 [0] NCCL INFO ncclCommInitRankConfig comm 0x13300130 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xdc96a09c11393e3 - Init COMPLETE
+gl064:2409682:2409720 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 4 total 3.03 (kernels 0.09, alloc 0.02, bootstrap 2.87, allgathers 0.03, topo 0.01, graphs 0.00, connections 0.00, rest 0.00)
+gl064:2409683:2409721 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 4 total 3.03 (kernels 0.09, alloc 0.02, bootstrap 2.87, allgathers 0.03, topo 0.01, graphs 0.00, connections 0.00, rest 0.00)
+gl064:2409682:2409736 [0] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [receive] via NET/IB/0
+gl064:2409682:2409737 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 14
+gl064:2409682:2409736 [0] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [receive] via NET/IB/0
+gl064:2409682:2409736 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
+gl064:2409682:2409736 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
+gl064:2409683:2409735 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [send] via NET/IB/0
+gl064:2409683:2409735 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [send] via NET/IB/0
+gl064:2409683:2409738 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 15
+gl064:2409683:2409735 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+gl064:2409682:2409736 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+Running tokenizer on dataset: 100%|| 54000/54000 [00:00<?, ? examples/s]Running tokenizer on dataset: 55000 examples [00:02, 381.21 examples/s]           Running tokenizer on dataset: 56000 examples [00:05, 383.21 examples/s]Running tokenizer on dataset: 57000 examples [00:07, 379.29 examples/s]Running tokenizer on dataset: 58000 examples [00:10, 384.32 examples/s]Running tokenizer on dataset: 59000 examples [00:13, 377.98 examples/s]Running tokenizer on dataset: 60000 examples [00:15, 383.78 examples/s]Running tokenizer on dataset: 61000 examples [00:18, 381.79 examples/s]Running tokenizer on dataset: 62000 examples [00:21, 377.11 examples/s]Running tokenizer on dataset: 63000 examples [00:23, 377.18 examples/s]NCCL INFO Bootstrap timings total 2.765264 (create 0.000027, send 0.000355, recv 0.001108, ring 0.001354, delay 0.000000)
+gl065:3813031:3813151 [0] NCCL INFO Bootstrap timings total 0.014762 (create 0.000029, send 0.000443, recv 0.001086, ring 0.001169, delay 0.000000)
+gl065:3813032:3813142 [1] NCCL INFO Setting affinity for GPU 1 to 0-31
+gl065:3813031:3813151 [0] NCCL INFO Setting affinity for GPU 0 to 0-31
+gl065:3813031:3813151 [0] NCCL INFO comm 0x1376a6e0 rank 2 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0
+gl065:3813032:3813142 [1] NCCL INFO comm 0x12d355a0 rank 3 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0
+gl065:3813031:3813151 [0] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 3/0/-1->2->-1
+gl065:3813031:3813151 [0] NCCL INFO P2P Chunksize set to 131072
+gl065:3813032:3813142 [1] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2
+gl065:3813032:3813142 [1] NCCL INFO P2P Chunksize set to 131072
+gl065:3813031:3813151 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
+gl065:3813032:3813142 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
+gl065:3813031:3813156 [0] NCCL INFO [Proxy Service] Device 0 CPU core 21
+gl065:3813031:3813157 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 22
+gl065:3813032:3813159 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 24
+gl065:3813032:3813158 [1] NCCL INFO [Proxy Service] Device 1 CPU core 23
+gl065:3813031:3813151 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl065:3813031:3813151 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl065:3813032:3813142 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl065:3813032:3813142 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl065:3813031:3813151 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl065:3813031:3813151 [0] NCCL INFO ncclCommInitRankConfig comm 0x1376a6e0 rank 2 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xdc96a09c11393e3 - Init COMPLETE
+gl065:3813031:3813151 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 2 nranks 4 total 0.14 (kernels 0.08, alloc 0.01, bootstrap 0.01, allgathers 0.00, topo 0.03, graphs 0.00, connections 0.00, rest 0.00)
+gl065:3813032:3813142 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl065:3813032:3813142 [1] NCCL INFO ncclCommInitRankConfig comm 0x12d355a0 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xdc96a09c11393e3 - Init COMPLETE
+gl065:3813032:3813142 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 3 nranks 4 total 2.90 (kernels 0.08, alloc 0.01, bootstrap 2.77, allgathers 0.02, topo 0.02, graphs 0.00, connections 0.00, rest 0.00)
+gl065:3813031:3813160 [0] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [receive] via NET/IB/0
+gl065:3813031:3813160 [0] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [receive] via NET/IB/0
+gl065:3813031:3813162 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 27
+gl065:3813031:3813160 [0] NCCL INFO Channel 00 : 2[0] -> 3[1] via SHM/direct/direct
+gl065:3813031:3813160 [0] NCCL INFO Channel 01 : 2[0] -> 3[1] via SHM/direct/direct
+gl065:3813032:3813161 [1] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [send] via NET/IB/0
+gl065:3813032:3813161 [1] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [send] via NET/IB/0
+gl065:3813032:3813163 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 28
+gl065:3813031:3813160 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+gl065:3813032:3813161 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+Running tokenizer on dataset: 100%|| 54000/54000 [00:00<?, ? examples/s]Running tokenizer on dataset: 55000 examples [00:02, 383.46 examples/s]           Running tokenizer on dataset: 56000 examples [00:05, 384.43 examples/s]Running tokenizer on dataset: 57000 examples [00:07, 380.44 examples/s]Running tokenizer on dataset: 58000 examples [00:10, 386.60 examples/s]Running tokenizer on dataset: 59000 examples [00:13, 379.68 examples/s]Running tokenizer on dataset: 60000 examples [00:15, 383.63 examples/s]Running tokenizer on dataset: 61000 examples [00:18, 381.30 examples/s]Running tokenizer on dataset: 62000 examples [00:21, 378.17 examples/s]Running tokenizer on dataset: 63000 examples [00:23, 378.29 examples/s]Running tokenizer on dataset: 64000 examples [00:26, 377.59 examples/s]Running tokenizer on dataset: 65000 examples [00:28, 379.11 examples/s]Running tokenizer on dataset: 66000 examples [00:31, 378.79 examples/s]Running tokenizer on dataset: 67000 examples [00:34, 380.48 examples/s]Running tokenizer on dataset: 68000 examples [00:36, 382.52 examples/s]Running tokenizer on dataset: 69000 examples [00:39, 380.15 examples/s]Running tokenizer on dataset: 70000 examples [00:42, 384.24 examples/s]Running tokenizer on dataset: 71000 examples [00:44, 381.38 examples/s]Running tokenizer on dataset: 72000 examples [00:47, 380.47 examples/s]Running tokenizer on dataset: 73000 examples [00:50, 377.72 examples/s]Running tokenizer on dataset: 74000 examples [00:52, 379.71 examples/s]Running tokenizer on dataset: 75000 examples [00:55, 378.57 examples/s]Running tokenizer on dataset: 76000 examples [00:57, 377.41 examples/s]Running tokenizer on dataset: 77000 examples [01:00, 380.88 examples/s]Running tokenizer on dataset: 78000 examples [01:03, 371.78 examples/s]Running tokenizer on dataset: 79000 examples [01:05, 376.52 examples/s]Running tokenizer on dataset: 80000 examples [01:08, 377.27 examples/s]Running tokenizer on dataset: 81000 examples [01:11, 376.67 examples/s]Running tokenizer on dataset: 82000 examples [01:13, 379.51 examples/s]Running tokenizer on dataset: 83000 examples [01:16, 379.42 examples/s]Running tokenizer on dataset: 84000 examples [01:19, 382.57 examples/s]Running tokenizer on dataset: 85000 examples [01:21, 380.28 examples/s]Running tokenizer on dataset: 86000 examples [01:24, 380.09 examples/s]Running tokenizer on dataset: 87000 examples [01:26, 379.76 examples/s]Running tokenizer on dataset: 88000 examples [01:29, 384.36 examples/s]Running tokenizer on dataset: 89000 examples [01:32, 382.64 examples/s]Running tokenizer on dataset: 90000 examples [01:34, 384.34 examples/s]Running tokenizer on dataset: 91000 examples [01:37, 384.94 examples/s]Running tokenizer on dataset: 92000 examples [01:39, 385.98 examples/s]Running tokenizer on dataset: 93000 examples [01:42, 381.24 examples/s]Running tokenizer on dataset: 94000 examples [01:45, 382.69 examples/s]Running tokenizer on dataset: 95000 examples [01:47, 382.80 examples/s]Running tokenizer on dataset: 96000 examples [01:50, 377.51 examples/s]Running tokenizer on dataset: 97000 examples [01:53, 375.05 examples/s]Running tokenizer on dataset: 98000 examples [01:55, 375.18 examples/s]Running tokenizer on dataset: 99000 examples [01:58, 377.27 examples/s]Running tokenizer on dataset: 100000 examples [02:01, 380.03 examples/s]Running tokenizer on dataset: 101000 examples [02:03, 381.19 examples/s]Running tokenizer on dataset: 102000 examples [02:06, 379.07 examples/s]Running tokenizer on dataset: 103000 examples [02:09, 377.06 examples/s]Running tokenizer on dataset: 104000 examples [02:11, 380.79 examples/s]Running tokenizer on dataset: 105000 examples [02:14, 380.50 examples/s]Running tokenizer on dataset: 106000 examples [02:16, 378.94 examples/s]Running tokenizer on dataset: 107000 examples [02:19, 376.66 examples/s]Running tokenizer on dataset: 108000 examples [02:22, 348.45 examples/s]Running tokenizer on dataset: 108000 examples [02:23, 377.00 examples/s]
+training example:
+input_ids:
+[33975, 25, 21806, 279, 2701, 3491, 13, 81917, 697, 32711, 3019, 553, 3019, 13, 3197, 498, 525, 8060, 11, 2968, 697, 4226, 304, 419, 3561, 25, 366, 9217, 2235, 21732, 4226, 12533, 9217, 94367, 2, 22079, 198, 16429, 279, 5109, 304, 279, 1140, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 1125, 1855, 458, 23606, 429, 16819, 220, 16, 21, 24, 13, 1446, 646, 990, 6770, 34784, 7525, 17973, 11, 85922, 11777, 608, 8, 323, 1817, 1372, 646, 1172, 387, 1483, 3055, 13, 4615, 6291, 1265, 2924, 264, 4013, 315, 7354, 330, 8304, 1599, 2974, 1380, 1817, 3019, 374, 264, 35972, 5666, 323, 279, 1590, 3019, 13653, 11508, 311, 279, 2169, 1372, 476, 432, 1265, 387, 264, 3175, 23606, 429, 3059, 304, 279, 2169, 382, 35127, 697, 4226, 304, 279, 2701, 3561, 510, 27, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 1339, 9064, 11993, 21732, 4226, 9940, 374, 279, 1140, 315, 7354, 311, 5545, 279, 2169, 1372, 476, 432, 1265, 387, 264, 3175, 23606, 429, 3059, 304, 279, 2169, 13, 4710, 2461, 3110, 510, 2679, 279, 1140, 315, 5109, 572, 508, 16, 11, 220, 17, 11, 220, 18, 60, 323, 279, 2169, 572, 220, 16, 11, 498, 1410, 3270, 510, 27, 9217, 397, 8304, 220, 16, 25, 220, 16, 488, 220, 17, 284, 220, 18, 198, 8304, 220, 17, 25, 220, 18, 608, 220, 18, 284, 220, 16, 198, 522, 9217, 1339, 269, 4710, 27, 9217, 397, 7, 16, 488, 220, 17, 8, 608, 220, 18, 198, 522, 9217, 1339, 10061, 594, 1744, 3019, 553, 3019, 13, 151643, 198, 71703, 25, 13708, 766, 397, 27, 13611, 397, 5338, 11, 358, 3278, 1430, 311, 3535, 279, 3491, 2664, 553, 4378, 700, 264, 3119, 323, 728, 2167, 5538, 1119, 7716, 911, 1246, 358, 1265, 11625, 419, 624, 8304, 220, 16, 25, 64547, 279, 5795, 481, 1205, 1184, 311, 1855, 458, 23606, 1667, 279, 5109, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 429, 16819, 220, 16, 21, 24, 624, 8304, 220, 17, 25, 21144, 3204, 7525, 481, 78646, 11, 75240, 11, 46444, 11, 323, 12804, 624, 8304, 220, 18, 25, 9735, 2155, 27459, 315, 1493, 7525, 448, 279, 2661, 5109, 624, 8304, 220, 19, 25, 3393, 5257, 37906, 3080, 582, 1477, 825, 429, 16819, 220, 16, 21, 24, 624, 8304, 220, 20, 25, 9646, 582, 614, 1730, 279, 4396, 23606, 11, 3042, 432, 304, 279, 5189, 3561, 624, 6025, 2087, 1526, 678, 1493, 7354, 11, 1588, 594, 279, 1590, 23606, 429, 16819, 220, 16, 21, 24, 1447, 27, 9217, 397, 7, 21, 22, 353, 220, 17, 8, 488, 220, 22, 16, 481, 220, 18, 16, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 5338, 11, 220, 21, 22, 488, 220, 22, 16, 284, 220, 16, 18, 23, 11, 1221, 220, 16, 18, 23, 481, 220, 18, 16, 284, 220, 16, 15, 22, 13, 1096, 1558, 537, 6144, 220, 16, 21, 24, 11, 773, 279, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 10061, 752, 10146, 847, 4226, 624, 27, 13611, 397, 58465, 12153, 15846, 1447, 10253, 279, 5109, 504, 279, 2661, 738, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 311, 1352, 458, 23606, 6144, 287, 220, 16, 21, 24, 1667, 6770, 34784, 7525, 17973, 11, 85922, 11777, 608, 3593, 8304, 220, 16, 25, 71059, 220, 21, 22, 323, 220, 18, 16, 624, 8304, 220, 17, 25, 2691, 220, 22, 16, 311, 279, 1102, 504, 14822, 220, 16, 624, 8304, 220, 18, 25, 576, 1590, 1102, 686, 387, 220, 16, 21, 24, 382, 27, 9217, 397, 7, 21, 22, 353, 220, 18, 16, 8, 488, 220, 22, 16, 284, 220, 16, 21, 24, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 5338, 11, 220, 21, 22, 353, 220, 18, 16, 284, 220, 17, 11, 15, 22, 22, 11, 1221, 220, 17, 11, 15, 22, 22, 481, 220, 22, 16, 284, 220, 17, 11, 15, 15, 21, 13, 1096, 374, 537, 6144, 311, 220, 16, 21, 24, 11, 773, 279, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 1462, 311, 3395, 847, 2033, 825, 803, 882, 624, 27, 13611, 397, 5338, 11, 358, 3278, 1430, 311, 3535, 279, 3491, 2664, 553, 4378, 700, 264, 3119, 323, 728, 2167, 5538, 1119, 7716, 911, 1246, 358, 1265, 11625, 419, 624, 785, 5795, 374, 311, 1855, 458, 23606, 1667, 279, 5109, 220, 21, 22, 11, 220, 22, 16, 11, 323, 220, 18, 16, 429, 16819, 220, 16, 21, 24, 624, 40, 686, 1191, 553, 4460, 2155, 27459, 315, 1493, 2326, 5109, 323, 862, 3204, 7525, 17973, 11, 85922, 11777, 608, 568, 715, 12209, 358, 686, 15442, 1817, 10601, 323, 1779, 421, 807, 6144, 220, 16, 21, 24, 13, 1416, 537, 11, 358, 686, 3271, 389, 311, 279, 1790, 10601, 3080, 358, 1477, 825, 429, 4278, 624, 6025, 9271, 279, 4396, 10601, 11, 358, 686, 3042, 847, 6291, 304, 279, 4055, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 9877, 3561, 624, 8420, 594, 847, 11682, 3119, 1447, 16, 13, 4230, 678, 3204, 27459, 315, 220, 21, 22, 11, 220, 22, 16, 11, 323, 220, 18, 16, 1667, 5256, 11, 75240, 11, 46444, 11, 323, 12804, 624, 17, 13, 54115, 1817, 10601, 311, 1490, 421, 432, 16819, 220, 16, 21, 24, 624, 18, 13, 1416, 902, 10601, 16819, 220, 16, 21, 24, 11, 13153, 279, 1882, 448, 501, 27459, 3080, 358, 1477, 825, 429, 4278, 624, 19, 13, 9646, 358, 614, 1730, 279, 4396, 10601, 11, 3158, 432, 438, 264, 3175, 23606, 304, 279, 4055, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 9877, 3561, 382, 7039, 1077, 752, 3161, 15039, 369, 279, 4396, 10601, 4894, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 16, 13, 14822, 220, 16, 25, 220, 21, 22, 353, 220, 18, 16, 284, 220, 17, 15, 22, 22, 198, 17, 13, 14822, 220, 17, 25, 220, 17, 15, 22, 22, 481, 220, 22, 16, 284, 220, 17, 15, 15, 21, 271, 54815, 11, 279, 1590, 1102, 374, 220, 17, 15, 15, 21, 11, 892, 1558, 537, 6144, 220, 16, 21, 24, 13, 576, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 3983, 3783, 11, 1077, 752, 1744, 911, 432, 1549, 624, 27, 13611, 397, 58465, 12153, 15846, 1447, 3838, 374, 279, 1102, 979, 498, 912, 1378, 5109, 504, 279, 1140, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 3786, 1939, 8304, 220, 16, 25, 2691, 220, 21, 22, 323, 220, 18, 16, 624, 8304, 220, 17, 25, 11778, 279, 2629, 504, 14822, 220, 16, 323, 912, 220, 22, 16, 382, 19357, 84670, 25, 320, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 271, 27, 9217, 397, 7, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 25, 5512, 11, 220, 21, 22, 488, 220, 18, 16, 284, 220, 24, 23, 11, 1221, 220, 24, 23, 488, 220, 22, 16, 284, 220, 16, 21, 24, 13, 576, 4226, 374, 4396, 382, 27, 423, 8477, 397, 33092, 198, 522, 423, 8477, 397, 522, 34913, 397, 522, 26865, 10370, 39, 763, 11, 279, 4396, 4226, 374, 1447, 27, 9217, 397, 7, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 198, 522, 9217, 29, 151643, 198]
+inputs:
+Human: Answer the following problem. Explain your reasoning step by step. When you are finished, give your answer in this format: <answer>(your answer)</answer>.
+
+# Problem
+Using the numbers in the list [67, 71, 31], create an equation that equals 169. You can use basic arithmetic operations (+, -, *, /) and each number can only be used once. Your solution should include a series of steps "Step X:" where each step is a mathematical operation and the final step ultimately leads to the target number or it should be a single equation that results in the target.
+
+Give your answer in the following format:
+<answer>
+(your answer)
+</answer>
+
+Where "(your answer)" is the list of steps to reach the target number or it should be a single equation that results in the target. 
+
+For example:
+If the list of numbers was [1, 2, 3] and the target was 1, you could write:
+<answer>
+Step 1: 1 + 2 = 3
+Step 2: 3 / 3 = 1
+</answer>
+
+or 
+
+<answer>
+(1 + 2) / 3
+</answer>
+
+Let's think step by step.<|endoftext|>
+Assistant:<think>
+<sample>
+First, I'll try to understand the problem better by writing out a plan and go really deep into detail about how I should solve this.
+Step 1: Identify the goal - We need to create an equation using the numbers [67, 71, 31] that equals 169.
+Step 2: Consider possible operations - Addition, subtraction, multiplication, and division.
+Step 3: Try different combinations of these operations with the given numbers.
+Step 4: Test various equations until we find one that equals 169.
+Step 5: Once we have found the correct equation, present it in the specified format.
+After going through all these steps, here's the final equation that equals 169:
+
+<answer>
+(67 * 2) + 71 - 31
+</answer>
+</sample>
+<reflect>
+Let us verify this answer:
+
+First, 67 + 71 = 138, then 138 - 31 = 107. This does not equal 169, so the answer is incorrect.
+
+<verdict>
+Incorrect
+</verdict>
+</reflect>
+Let me verify my answer.
+<sample>
+Rewritten Question:
+
+Use the numbers from the given set [67, 71, 31] to form an equation equaling 169 using basic arithmetic operations (+, -, *, /).
+
+Step 1: Multiply 67 and 31.
+Step 2: Add 71 to the result from Step 1.
+Step 3: The final result will be 169.
+
+<answer>
+(67 * 31) + 71 = 169
+</answer>
+</sample>
+<reflect>
+Let us verify this answer:
+
+First, 67 * 31 = 2,077, then 2,077 - 71 = 2,006. This is not equal to 169, so the answer is incorrect.
+
+<verdict>
+Incorrect
+</verdict>
+</reflect>
+Time to review my response one more time.
+<sample>
+First, I'll try to understand the problem better by writing out a plan and go really deep into detail about how I should solve this.
+The goal is to create an equation using the numbers 67, 71, and 31 that equals 169.
+I will start by trying different combinations of these three numbers and their possible operations (+, -, *, /). 
+Then I will evaluate each combination and check if they equal 169. If not, I will move on to the next combination until I find one that works.
+After finding the correct combination, I will present my solution in the "<answer>
+(your answer)
+</answer>" format.
+Here's my detailed plan:
+
+1. Create all possible combinations of 67, 71, and 31 using addition, subtraction, multiplication, and division.
+2. Evaluate each combination to see if it equals 169.
+3. If no combination equals 169, repeat the process with new combinations until I find one that works.
+4. Once I have found the correct combination, express it as a single equation in the "<answer>
+(your answer)
+</answer>" format.
+
+Now let me begin searching for the correct combination!
+</sample>
+<reflect>
+Let us verify this answer:
+
+1. Step 1: 67 * 31 = 2077
+2. Step 2: 2077 - 71 = 2006
+
+Therefore, the final result is 2006, which does not equal 169. The answer is incorrect.
+
+<verdict>
+Incorrect
+</verdict>
+</reflect>
+But wait, let me think about it again.
+<sample>
+Rewritten Question:
+
+What is the result when you add two numbers from the list [67, 71, 31] together?
+
+Step 1: Add 67 and 31.
+Step 2: Take the sum from Step 1 and add 71.
+
+Final Equation: (67 + 31) + 71
+
+<answer>
+(67 + 31) + 71
+</answer>
+</sample>
+<reflect>
+Let us verify this answer: First, 67 + 31 = 98, then 98 + 71 = 169. The answer is correct.
+
+<verdict>
+Correct
+</verdict>
+</reflect>
+</think>
+
+
+Hence, the correct answer is:
+
+<answer>
+(67 + 31) + 71
+</answer><|endoftext|>
+
+label_ids:
+[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 13708, 766, 397, 27, 13611, 397, 5338, 11, 358, 3278, 1430, 311, 3535, 279, 3491, 2664, 553, 4378, 700, 264, 3119, 323, 728, 2167, 5538, 1119, 7716, 911, 1246, 358, 1265, 11625, 419, 624, 8304, 220, 16, 25, 64547, 279, 5795, 481, 1205, 1184, 311, 1855, 458, 23606, 1667, 279, 5109, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 429, 16819, 220, 16, 21, 24, 624, 8304, 220, 17, 25, 21144, 3204, 7525, 481, 78646, 11, 75240, 11, 46444, 11, 323, 12804, 624, 8304, 220, 18, 25, 9735, 2155, 27459, 315, 1493, 7525, 448, 279, 2661, 5109, 624, 8304, 220, 19, 25, 3393, 5257, 37906, 3080, 582, 1477, 825, 429, 16819, 220, 16, 21, 24, 624, 8304, 220, 20, 25, 9646, 582, 614, 1730, 279, 4396, 23606, 11, 3042, 432, 304, 279, 5189, 3561, 624, 6025, 2087, 1526, 678, 1493, 7354, 11, 1588, 594, 279, 1590, 23606, 429, 16819, 220, 16, 21, 24, 1447, 27, 9217, 397, 7, 21, 22, 353, 220, 17, 8, 488, 220, 22, 16, 481, 220, 18, 16, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 5338, 11, 220, 21, 22, 488, 220, 22, 16, 284, 220, 16, 18, 23, 11, 1221, 220, 16, 18, 23, 481, 220, 18, 16, 284, 220, 16, 15, 22, 13, 1096, 1558, 537, 6144, 220, 16, 21, 24, 11, 773, 279, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 10061, 752, 10146, 847, 4226, 624, 27, 13611, 397, 58465, 12153, 15846, 1447, 10253, 279, 5109, 504, 279, 2661, 738, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 311, 1352, 458, 23606, 6144, 287, 220, 16, 21, 24, 1667, 6770, 34784, 7525, 17973, 11, 85922, 11777, 608, 3593, 8304, 220, 16, 25, 71059, 220, 21, 22, 323, 220, 18, 16, 624, 8304, 220, 17, 25, 2691, 220, 22, 16, 311, 279, 1102, 504, 14822, 220, 16, 624, 8304, 220, 18, 25, 576, 1590, 1102, 686, 387, 220, 16, 21, 24, 382, 27, 9217, 397, 7, 21, 22, 353, 220, 18, 16, 8, 488, 220, 22, 16, 284, 220, 16, 21, 24, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 5338, 11, 220, 21, 22, 353, 220, 18, 16, 284, 220, 17, 11, 15, 22, 22, 11, 1221, 220, 17, 11, 15, 22, 22, 481, 220, 22, 16, 284, 220, 17, 11, 15, 15, 21, 13, 1096, 374, 537, 6144, 311, 220, 16, 21, 24, 11, 773, 279, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 1462, 311, 3395, 847, 2033, 825, 803, 882, 624, 27, 13611, 397, 5338, 11, 358, 3278, 1430, 311, 3535, 279, 3491, 2664, 553, 4378, 700, 264, 3119, 323, 728, 2167, 5538, 1119, 7716, 911, 1246, 358, 1265, 11625, 419, 624, 785, 5795, 374, 311, 1855, 458, 23606, 1667, 279, 5109, 220, 21, 22, 11, 220, 22, 16, 11, 323, 220, 18, 16, 429, 16819, 220, 16, 21, 24, 624, 40, 686, 1191, 553, 4460, 2155, 27459, 315, 1493, 2326, 5109, 323, 862, 3204, 7525, 17973, 11, 85922, 11777, 608, 568, 715, 12209, 358, 686, 15442, 1817, 10601, 323, 1779, 421, 807, 6144, 220, 16, 21, 24, 13, 1416, 537, 11, 358, 686, 3271, 389, 311, 279, 1790, 10601, 3080, 358, 1477, 825, 429, 4278, 624, 6025, 9271, 279, 4396, 10601, 11, 358, 686, 3042, 847, 6291, 304, 279, 4055, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 9877, 3561, 624, 8420, 594, 847, 11682, 3119, 1447, 16, 13, 4230, 678, 3204, 27459, 315, 220, 21, 22, 11, 220, 22, 16, 11, 323, 220, 18, 16, 1667, 5256, 11, 75240, 11, 46444, 11, 323, 12804, 624, 17, 13, 54115, 1817, 10601, 311, 1490, 421, 432, 16819, 220, 16, 21, 24, 624, 18, 13, 1416, 902, 10601, 16819, 220, 16, 21, 24, 11, 13153, 279, 1882, 448, 501, 27459, 3080, 358, 1477, 825, 429, 4278, 624, 19, 13, 9646, 358, 614, 1730, 279, 4396, 10601, 11, 3158, 432, 438, 264, 3175, 23606, 304, 279, 4055, 9217, 397, 7021, 413, 4226, 340, 522, 9217, 9877, 3561, 382, 7039, 1077, 752, 3161, 15039, 369, 279, 4396, 10601, 4894, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 1447, 16, 13, 14822, 220, 16, 25, 220, 21, 22, 353, 220, 18, 16, 284, 220, 17, 15, 22, 22, 198, 17, 13, 14822, 220, 17, 25, 220, 17, 15, 22, 22, 481, 220, 22, 16, 284, 220, 17, 15, 15, 21, 271, 54815, 11, 279, 1590, 1102, 374, 220, 17, 15, 15, 21, 11, 892, 1558, 537, 6144, 220, 16, 21, 24, 13, 576, 4226, 374, 15114, 382, 27, 423, 8477, 397, 40468, 198, 522, 423, 8477, 397, 522, 34913, 397, 3983, 3783, 11, 1077, 752, 1744, 911, 432, 1549, 624, 27, 13611, 397, 58465, 12153, 15846, 1447, 3838, 374, 279, 1102, 979, 498, 912, 1378, 5109, 504, 279, 1140, 508, 21, 22, 11, 220, 22, 16, 11, 220, 18, 16, 60, 3786, 1939, 8304, 220, 16, 25, 2691, 220, 21, 22, 323, 220, 18, 16, 624, 8304, 220, 17, 25, 11778, 279, 2629, 504, 14822, 220, 16, 323, 912, 220, 22, 16, 382, 19357, 84670, 25, 320, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 271, 27, 9217, 397, 7, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 198, 522, 9217, 397, 522, 13611, 397, 27, 34913, 397, 10061, 601, 10146, 419, 4226, 25, 5512, 11, 220, 21, 22, 488, 220, 18, 16, 284, 220, 24, 23, 11, 1221, 220, 24, 23, 488, 220, 22, 16, 284, 220, 16, 21, 24, 13, 576, 4226, 374, 4396, 382, 27, 423, 8477, 397, 33092, 198, 522, 423, 8477, 397, 522, 34913, 397, 522, 26865, 10370, 39, 763, 11, 279, 4396, 4226, 374, 1447, 27, 9217, 397, 7, 21, 22, 488, 220, 18, 16, 8, 488, 220, 22, 16, 198, 522, 9217, 29, 151643, 198]
+labels:
+<think>
+<sample>
+First, I'll try to understand the problem better by writing out a plan and go really deep into detail about how I should solve this.
+Step 1: Identify the goal - We need to create an equation using the numbers [67, 71, 31] that equals 169.
+Step 2: Consider possible operations - Addition, subtraction, multiplication, and division.
+Step 3: Try different combinations of these operations with the given numbers.
+Step 4: Test various equations until we find one that equals 169.
+Step 5: Once we have found the correct equation, present it in the specified format.
+After going through all these steps, here's the final equation that equals 169:
+
+<answer>
+(67 * 2) + 71 - 31
+</answer>
+</sample>
+<reflect>
+Let us verify this answer:
+
+First, 67 + 71 = 138, then 138 - 31 = 107. This does not equal 169, so the answer is incorrect.
+
+<verdict>
+Incorrect
+</verdict>
+</reflect>
+Let me verify my answer.
+<sample>
+Rewritten Question:
+
+Use the numbers from the given set [67, 71, 31] to form an equation equaling 169 using basic arithmetic operations (+, -, *, /).
+
+Step 1: Multiply 67 and 31.
+Step 2: Add 71 to the result from Step 1.
+Step 3: The final result will be 169.
+
+<answer>
+(67 * 31) + 71 = 169
+</answer>
+</sample>
+<reflect>
+Let us verify this answer:
+
+First, 67 * 31 = 2,077, then 2,077 - 71 = 2,006. This is not equal to 169, so the answer is incorrect.
+
+<verdict>
+Incorrect
+</verdict>
+</reflect>
+Time to review my response one more time.
+<sample>
+First, I'll try to understand the problem better by writing out a plan and go really deep into detail about how I should solve this.
+The goal is to create an equation using the numbers 67, 71, and 31 that equals 169.
+I will start by trying different combinations of these three numbers and their possible operations (+, -, *, /). 
+Then I will evaluate each combination and check if they equal 169. If not, I will move on to the next combination until I find one that works.
+After finding the correct combination, I will present my solution in the "<answer>
+(your answer)
+</answer>" format.
+Here's my detailed plan:
+
+1. Create all possible combinations of 67, 71, and 31 using addition, subtraction, multiplication, and division.
+2. Evaluate each combination to see if it equals 169.
+3. If no combination equals 169, repeat the process with new combinations until I find one that works.
+4. Once I have found the correct combination, express it as a single equation in the "<answer>
+(your answer)
+</answer>" format.
+
+Now let me begin searching for the correct combination!
+</sample>
+<reflect>
+Let us verify this answer:
+
+1. Step 1: 67 * 31 = 2077
+2. Step 2: 2077 - 71 = 2006
+
+Therefore, the final result is 2006, which does not equal 169. The answer is incorrect.
+
+<verdict>
+Incorrect
+</verdict>
+</reflect>
+But wait, let me think about it again.
+<sample>
+Rewritten Question:
+
+What is the result when you add two numbers from the list [67, 71, 31] together?
+
+Step 1: Add 67 and 31.
+Step 2: Take the sum from Step 1 and add 71.
+
+Final Equation: (67 + 31) + 71
+
+<answer>
+(67 + 31) + 71
+</answer>
+</sample>
+<reflect>
+Let us verify this answer: First, 67 + 31 = 98, then 98 + 71 = 169. The answer is correct.
+
+<verdict>
+Correct
+</verdict>
+</reflect>
+</think>
+
+
+Hence, the correct answer is:
+
+<answer>
+(67 + 31) + 71
+</answer><|endoftext|>
+
+[INFO|configuration_utils.py:765] 2025-10-22 17:26:09,864 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:26:09,865 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|2025-10-22 17:26:09] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
+[WARNING|logging.py:328] 2025-10-22 17:26:10,583 >> `torch_dtype` is deprecated! Use `dtype` instead!
+`torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|modeling_utils.py:1172] 2025-10-22 17:26:10,584 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
+[INFO|modeling_utils.py:2341] 2025-10-22 17:26:10,585 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
+[INFO|configuration_utils.py:986] 2025-10-22 17:26:10,587 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "use_cache": false
+}
+
+[INFO|configuration_utils.py:941] 2025-10-22 17:26:11,032 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
+[INFO|configuration_utils.py:986] 2025-10-22 17:26:11,033 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "max_new_tokens": 2048
+}
+
+[INFO|dynamic_module_utils.py:423] 2025-10-22 17:26:11,064 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
+[INFO|2025-10-22 17:26:11] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
+[INFO|2025-10-22 17:26:11] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
+[INFO|2025-10-22 17:26:11] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
+[INFO|2025-10-22 17:26:11] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
+[INFO|2025-10-22 17:26:11] llamafactory.model.model_utils.misc:143 >> Found linear modules: o_proj,down_proj,gate_proj,up_proj,v_proj,k_proj,q_proj
+[INFO|2025-10-22 17:26:11] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826
+[WARNING|trainer.py:906] 2025-10-22 17:26:11,180 >> The model is already on multiple devices. Skipping the move to device specified in `args`.
+The model is already on multiple devices. Skipping the move to device specified in `args`.
+[INFO|trainer.py:699] 2025-10-22 17:26:11,183 >> max_steps is given, it will override any value given in num_train_epochs
+[INFO|trainer.py:749] 2025-10-22 17:26:11,183 >> Using auto half precision backend
+[WARNING|2025-10-22 17:26:11] llamafactory.train.callbacks:154 >> Previous trainer log in this folder will be deleted.
+The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+[WARNING|trainer.py:982] 2025-10-22 17:26:11,187 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+[INFO|trainer.py:2519] 2025-10-22 17:26:11,475 >> ***** Running training *****
+[INFO|trainer.py:2520] 2025-10-22 17:26:11,475 >>   Num examples = 48,600
+[INFO|trainer.py:2521] 2025-10-22 17:26:11,475 >>   Num Epochs = 1
+[INFO|trainer.py:2522] 2025-10-22 17:26:11,475 >>   Instantaneous batch size per device = 1
+[INFO|trainer.py:2525] 2025-10-22 17:26:11,475 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
+[INFO|trainer.py:2526] 2025-10-22 17:26:11,475 >>   Gradient Accumulation steps = 1
+[INFO|trainer.py:2527] 2025-10-22 17:26:11,475 >>   Total optimization steps = 100
+[INFO|trainer.py:2528] 2025-10-22 17:26:11,477 >>   Number of trainable parameters = 4,399,104
+[INFO|integration_utils.py:867] 2025-10-22 17:26:11,499 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
+wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.22.2
+wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_172611-36bddmt4
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run interactive_test
+wandb:  View project at https://wandb.ai/ut_nlp_deduce/llamafactory
+wandb:  View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/36bddmt4
+  0%|          | 0/100 [00:00<?, ?it/s]  1%|          | 1/100 [00:00<00:52,  1.88it/s]  2%|         | 2/100 [00:01<00:48,  2.01it/s]  3%|         | 3/100 [00:01<00:33,  2.89it/s]  4%|         | 4/100 [00:01<00:28,  3.32it/s]  5%|         | 5/100 [00:01<00:28,  3.39it/s]  6%|         | 6/100 [00:01<00:26,  3.55it/s]  7%|         | 7/100 [00:02<00:32,  2.83it/s]  8%|         | 8/100 [00:02<00:28,  3.25it/s]  9%|         | 9/100 [00:02<00:25,  3.56it/s] 10%|         | 10/100 [00:03<00:22,  3.98it/s]                                                {'loss': 0.8095, 'grad_norm': 0.4082210063934326, 'learning_rate': 4.55e-05, 'epoch': 0.0}
+ 10%|         | 10/100 [00:03<00:22,  3.98it/s] 11%|         | 11/100 [00:03<00:21,  4.09it/s] 12%|        | 12/100 [00:04<00:34,  2.53it/s] 13%|        | 13/100 [00:04<00:28,  3.09it/s] 14%|        | 14/100 [00:04<00:23,  3.65it/s] 15%|        | 15/100 [00:04<00:20,  4.10it/s] 16%|        | 16/100 [00:04<00:19,  4.40it/s] 17%|        | 17/100 [00:04<00:20,  4.11it/s] 18%|        | 18/100 [00:05<00:23,  3.55it/s] 19%|        | 19/100 [00:05<00:22,  3.55it/s] 20%|        | 20/100 [00:05<00:20,  3.83it/s]                                                {'loss': 0.7526, 'grad_norm': 0.39774543046951294, 'learning_rate': 4.05e-05, 'epoch': 0.0}
+ 20%|        | 20/100 [00:05<00:20,  3.83it/s] 21%|        | 21/100 [00:06<00:20,  3.81it/s] 22%|       | 22/100 [00:06<00:19,  4.03it/s] 23%|       | 23/100 [00:06<00:19,  3.93it/s] 24%|       | 24/100 [00:06<00:18,  4.10it/s] 25%|       | 25/100 [00:07<00:19,  3.80it/s] 26%|       | 26/100 [00:07<00:18,  3.98it/s] 27%|       | 27/100 [00:07<00:18,  3.93it/s] 28%|       | 28/100 [00:07<00:20,  3.55it/s] 29%|       | 29/100 [00:08<00:19,  3.64it/s] 30%|       | 30/100 [00:08<00:18,  3.74it/s]                                                {'loss': 0.7383, 'grad_norm': 0.4655744731426239, 'learning_rate': 3.55e-05, 'epoch': 0.0}
+ 30%|       | 30/100 [00:08<00:18,  3.74it/s] 31%|       | 31/100 [00:08<00:18,  3.74it/s] 32%|      | 32/100 [00:08<00:17,  3.96it/s] 33%|      | 33/100 [00:09<00:15,  4.34it/s] 34%|      | 34/100 [00:09<00:13,  4.74it/s] 35%|      | 35/100 [00:09<00:14,  4.46it/s] 36%|      | 36/100 [00:09<00:13,  4.91it/s] 37%|      | 37/100 [00:09<00:13,  4.60it/s] 38%|      | 38/100 [00:10<00:13,  4.56it/s] 39%|      | 39/100 [00:10<00:12,  4.85it/s] 40%|      | 40/100 [00:10<00:13,  4.48it/s]                                                {'loss': 0.7139, 'grad_norm': 0.3746405839920044, 'learning_rate': 3.05e-05, 'epoch': 0.0}
+ 40%|      | 40/100 [00:10<00:13,  4.48it/s] 41%|      | 41/100 [00:10<00:14,  4.01it/s] 42%|     | 42/100 [00:11<00:14,  3.99it/s] 43%|     | 43/100 [00:11<00:12,  4.48it/s] 44%|     | 44/100 [00:11<00:11,  4.77it/s] 45%|     | 45/100 [00:11<00:10,  5.20it/s] 46%|     | 46/100 [00:11<00:11,  4.85it/s] 47%|     | 47/100 [00:12<00:11,  4.76it/s] 48%|     | 48/100 [00:12<00:10,  5.01it/s] 49%|     | 49/100 [00:12<00:11,  4.40it/s] 50%|     | 50/100 [00:12<00:11,  4.37it/s]                                                {'loss': 0.6497, 'grad_norm': 0.5908131003379822, 'learning_rate': 2.5500000000000003e-05, 'epoch': 0.0}
+ 50%|     | 50/100 [00:12<00:11,  4.37it/s][INFO|trainer.py:4309] 2025-10-22 17:26:25,397 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50
+[INFO|configuration_utils.py:765] 2025-10-22 17:26:25,617 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:26:25,619 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 17:26:25,791 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 17:26:25,796 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 17:26:25,815 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/special_tokens_map.json
+ 51%|     | 51/100 [00:14<00:25,  1.91it/s] 52%|    | 52/100 [00:14<00:21,  2.20it/s] 53%|    | 53/100 [00:14<00:18,  2.56it/s] 54%|    | 54/100 [00:14<00:14,  3.10it/s] 55%|    | 55/100 [00:15<00:14,  3.21it/s] 56%|    | 56/100 [00:15<00:11,  3.81it/s] 57%|    | 57/100 [00:15<00:11,  3.89it/s] 58%|    | 58/100 [00:15<00:09,  4.29it/s] 59%|    | 59/100 [00:15<00:08,  4.95it/s] 60%|    | 60/100 [00:15<00:07,  5.35it/s]                                                {'loss': 0.6288, 'grad_norm': 0.4912172555923462, 'learning_rate': 2.05e-05, 'epoch': 0.0}
+ 60%|    | 60/100 [00:15<00:07,  5.35it/s] 61%|    | 61/100 [00:16<00:07,  5.01it/s] 62%|   | 62/100 [00:16<00:06,  5.55it/s] 63%|   | 63/100 [00:16<00:07,  5.02it/s] 64%|   | 64/100 [00:16<00:06,  5.28it/s] 65%|   | 65/100 [00:16<00:07,  4.84it/s] 66%|   | 66/100 [00:17<00:07,  4.48it/s] 67%|   | 67/100 [00:17<00:06,  4.75it/s] 68%|   | 68/100 [00:17<00:07,  4.44it/s] 69%|   | 69/100 [00:17<00:07,  4.07it/s] 70%|   | 70/100 [00:18<00:07,  4.03it/s]                                                {'loss': 0.6135, 'grad_norm': 0.5216199159622192, 'learning_rate': 1.55e-05, 'epoch': 0.01}
+ 70%|   | 70/100 [00:18<00:07,  4.03it/s] 71%|   | 71/100 [00:18<00:07,  3.91it/s] 72%|  | 72/100 [00:18<00:06,  4.41it/s] 73%|  | 73/100 [00:18<00:06,  3.93it/s] 74%|  | 74/100 [00:19<00:06,  4.10it/s] 75%|  | 75/100 [00:19<00:05,  4.59it/s] 76%|  | 76/100 [00:19<00:05,  4.62it/s] 77%|  | 77/100 [00:19<00:04,  5.11it/s] 78%|  | 78/100 [00:19<00:04,  4.58it/s] 79%|  | 79/100 [00:20<00:04,  4.45it/s] 80%|  | 80/100 [00:20<00:04,  4.18it/s]                                                {'loss': 0.6435, 'grad_norm': 0.4013826549053192, 'learning_rate': 1.05e-05, 'epoch': 0.01}
+ 80%|  | 80/100 [00:20<00:04,  4.18it/s] 81%|  | 81/100 [00:20<00:04,  4.55it/s] 82%| | 82/100 [00:20<00:04,  3.98it/s] 83%| | 83/100 [00:21<00:03,  4.36it/s] 84%| | 84/100 [00:21<00:03,  4.31it/s] 85%| | 85/100 [00:21<00:03,  4.36it/s] 86%| | 86/100 [00:21<00:02,  4.95it/s] 87%| | 87/100 [00:21<00:02,  5.09it/s] 88%| | 88/100 [00:22<00:02,  4.79it/s] 89%| | 89/100 [00:22<00:02,  4.51it/s] 90%| | 90/100 [00:22<00:01,  5.10it/s]                                                {'loss': 0.6313, 'grad_norm': 0.5442547798156738, 'learning_rate': 5.500000000000001e-06, 'epoch': 0.01}
+ 90%| | 90/100 [00:22<00:01,  5.10it/s] 91%| | 91/100 [00:22<00:01,  4.56it/s] 92%|| 92/100 [00:23<00:01,  4.44it/s] 93%|| 93/100 [00:23<00:01,  4.64it/s] 94%|| 94/100 [00:23<00:01,  4.59it/s] 95%|| 95/100 [00:23<00:01,  4.68it/s] 96%|| 96/100 [00:23<00:00,  4.32it/s] 97%|| 97/100 [00:24<00:00,  3.77it/s] 98%|| 98/100 [00:24<00:00,  4.18it/s] 99%|| 99/100 [00:24<00:00,  4.44it/s]100%|| 100/100 [00:24<00:00,  4.22it/s]                                                 {'loss': 0.6241, 'grad_norm': 0.43993669748306274, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.01}
+100%|| 100/100 [00:24<00:00,  4.22it/s][INFO|trainer.py:4309] 2025-10-22 17:26:37,455 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+[INFO|configuration_utils.py:765] 2025-10-22 17:26:37,625 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:26:37,626 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 17:26:37,812 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 17:26:37,817 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 17:26:37,822 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/special_tokens_map.json
+[INFO|trainer.py:2810] 2025-10-22 17:26:38,266 >> 
+
+Training completed. Do not forget to share your model on huggingface.co/models =)
+
+
+                                                 {'train_runtime': 26.7899, 'train_samples_per_second': 14.931, 'train_steps_per_second': 3.733, 'train_loss': 0.6805129957199096, 'epoch': 0.01}
+100%|| 100/100 [00:25<00:00,  4.22it/s]100%|| 100/100 [00:25<00:00,  3.89it/s]
+[INFO|trainer.py:4309] 2025-10-22 17:26:38,277 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+[INFO|configuration_utils.py:765] 2025-10-22 17:26:38,395 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:26:38,396 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 17:26:38,514 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 17:26:38,520 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 17:26:38,524 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/special_tokens_map.json
+***** train metrics *****
+  epoch                    =     0.0082
+  total_flos               =  1473847GF
+  train_loss               =     0.6805
+  train_runtime            = 0:00:26.78
+  train_samples_per_second =     14.931
+  train_steps_per_second   =      3.733
+[INFO|modelcard.py:456] 2025-10-22 17:26:38,700 >> Dropping the following result as it does not have all the necessary fields:
+{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
+gl064:2409683:2409683 [1] NCCL INFO comm 0x12e235c0 rank 1 nranks 4 cudaDev 1 busId 59000 - Destroy COMPLETE
+gl064:2409682:2409682 [0] NCCL INFO comm 0x13300130 rank 0 nranks 4 cudaDev 0 busId 47000 - Destroy COMPLETE
+[1;34mwandb[0m: 
+[1;34mwandb[0m:  View run [33minteractive_test[0m at: [34m[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20251022_172611-36bddmt4/logs[0m
+
+========================================
+Training completed successfully
+End Time: Wed Oct 22 05:26:40 PM EDT 2025
+========================================
+/libtorch_python.so)
+frame #8: python() [0x543944]
+<omitting python frames>
+frame #10: python() [0x56cd70]
+frame #14: python() [0x6282e6]
+frame #18: python() [0x5fd997]
+frame #19: python() [0x53939e]
+frame #22: python() [0x6103ff]
+frame #25: <unknown function> + 0x295d0 (0x7f15c88295d0 in /lib64/libc.so.6)
+frame #26: __libc_start_main + 0x80 (0x7f15c8829680 in /lib64/libc.so.6)
+frame #27: python() [0x5c69e9]
+
+W1022 17:26:40.526000 3812984 site-packages/torch/distributed/elastic/rendezvous/dynamic_rendezvous.py:1292] The node 'gl065.hpc.nyu.edu_3812984_0' has failed to shutdown the rendezvous 'lf_torch_test__interactive' due to an error of type RendezvousConnectionError.
+
+========================================
+Training completed successfully
+End Time: Wed Oct 22 05:26:40 PM EDT 2025
+========================================
+
+========================================
+STAGE 2: Merging/Exporting Model
+Start Time: Wed Oct 22 05:26:40 PM EDT 2025
+========================================
+Skipping merge/export on worker node - rank 1
+Skipping artifact preparation and upload on worker node (rank 1)
+Skipping cleanup on worker node (rank 1)
+
+========================================
+PIPELINE COMPLETED SUCCESSFULLY
+End Time: Wed Oct 22 05:26:40 PM EDT 2025
+========================================
+
+========================================
+Cleaning up LlamaFactory processes
+========================================
+Successfully updated merge config
+nyu.edu
+Process cleanup complete
+Updated merge config to use: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+
+Merge config contents:
+  model_name_or_path: Qwen/Qwen2.5-0.5B
+  finetuning_type: lora
+  trust_remote_code: true
+  adapter_name_or_path: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+  template: default
+  export_dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+
+Executing command: llamafactory-cli export /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,159 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 17:26:51,330 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 17:26:51,555 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:26:51,557 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 17:26:51,621 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 17:26:51,788 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 17:26:51,838 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 17:26:51,839 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+
+[WARNING|logging.py:328] 2025-10-22 17:26:51,839 >> `torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|2025-10-22 17:26:51] llamafactory.model.model_utils.kv_cache:143 >> KV cache is enabled for faster generation.
+[WARNING|logging.py:328] 2025-10-22 17:26:52,156 >> `torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|modeling_utils.py:1172] 2025-10-22 17:26:52,157 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
+[INFO|modeling_utils.py:2341] 2025-10-22 17:26:52,158 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
+[INFO|configuration_utils.py:986] 2025-10-22 17:26:52,159 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643
+}
+
+[INFO|configuration_utils.py:941] 2025-10-22 17:26:52,240 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
+[INFO|configuration_utils.py:986] 2025-10-22 17:26:52,240 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "max_new_tokens": 2048
+}
+
+[INFO|dynamic_module_utils.py:423] 2025-10-22 17:26:52,274 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
+[INFO|2025-10-22 17:26:52] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
+[INFO|2025-10-22 17:26:53] llamafactory.model.adapter:143 >> Merged 1 adapter(s).
+[INFO|2025-10-22 17:26:53] llamafactory.model.adapter:143 >> Loaded adapter(s): /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+[INFO|2025-10-22 17:26:53] llamafactory.model.loader:143 >> all params: 494,032,768
+[INFO|2025-10-22 17:26:53] llamafactory.train.tuner:143 >> Convert model dtype to: torch.bfloat16.
+[INFO|configuration_utils.py:491] 2025-10-22 17:26:53,121 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/config.json
+[INFO|configuration_utils.py:757] 2025-10-22 17:26:53,125 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/generation_config.json
+[INFO|modeling_utils.py:4181] 2025-10-22 17:26:54,838 >> Model weights saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/model.safetensors
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 17:26:54,843 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 17:26:54,849 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 17:26:54,854 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/special_tokens_map.json
+[INFO|2025-10-22 17:26:54] llamafactory.train.tuner:143 >> Ollama modelfile saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/Modelfile
+
+========================================
+Merge/Export completed successfully
+End Time: Wed Oct 22 05:26:55 PM EDT 2025
+========================================
+
+========================================
+Preparing Training Artifacts
+========================================
+Copying configuration files...
+Copying and cleaning training logs...