Base model is: /zju_0038/wyy/mergebench/models/Llama-3.2-3B Models to be merged are: ['/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-algebra', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-analysis', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-number_theory', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-discrete', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-biology', '/zju_0038/yifyang/scripts/models/llama-3.2-Korean-Bllossom-3B', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-code', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-Instruct-tuned', '/zju_0038/yifyang/scripts/models/Llama3.2-3B-ShiningValiant2', '/zju_0038/yifyang/scripts/models/FineMath-Llama-3B', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B_MATH_lisa', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_normal', '/zju_0038/yifyang/scripts/models/GSM8K-Binary_Llama-3.2-3B-ihl420el', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_shorter_8b', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-SFT', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-R3'] Scaling coefficient is 0.1 Merging conducted on cpu Loading base model in offline mode... Loading checkpoint shards: 0%| | 0/2 [00:00= 1.5 and < 2.0 but detected 2.3  [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible Base model is: /zju_0038/wyy/mergebench/models/Llama-3.2-3B Models to be merged are: ['/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-algebra', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-analysis', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-number_theory', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-discrete', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-biology', '/zju_0038/yifyang/scripts/models/llama-3.2-Korean-Bllossom-3B', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-code', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-Instruct-tuned', '/zju_0038/yifyang/scripts/models/Llama3.2-3B-ShiningValiant2', '/zju_0038/yifyang/scripts/models/FineMath-Llama-3B', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B_MATH_lisa', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_normal', '/zju_0038/yifyang/scripts/models/GSM8K-Binary_Llama-3.2-3B-ihl420el', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_shorter_8b', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-SFT', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-R3'] Scaling coefficient is 0.1 Merging conducted on cpu Loading base model in offline mode... Loading checkpoint shards: 0%| | 0/2 [00:00= 1.5 and < 2.0 but detected 2.3  [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible Base model is: /zju_0038/wyy/mergebench/models/Llama-3.2-3B Models to be merged are: ['/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-algebra', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-analysis', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-number_theory', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-discrete', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-biology', '/zju_0038/yifyang/scripts/models/llama-3.2-Korean-Bllossom-3B', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-code', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-Instruct-tuned', '/zju_0038/yifyang/scripts/models/Llama3.2-3B-ShiningValiant2', '/zju_0038/yifyang/scripts/models/FineMath-Llama-3B', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B_MATH_lisa', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_normal', '/zju_0038/yifyang/scripts/models/GSM8K-Binary_Llama-3.2-3B-ihl420el', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_shorter_8b', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-SFT', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-R3'] Scaling coefficient is 0.1 Merging conducted on cpu Loading base model in offline mode... Loading checkpoint shards: 0%| | 0/2 [00:00= 1.5 and < 2.0 but detected 2.3  [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible Base model is: /zju_0038/wyy/mergebench/models/Llama-3.2-3B Models to be merged are: ['/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-algebra', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-analysis', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-number_theory', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-discrete', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-biology', '/zju_0038/yifyang/scripts/models/llama-3.2-Korean-Bllossom-3B', '/zju_0038/yifyang/scripts/models/llama-instruct-3B-v2-code', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-Instruct-tuned', '/zju_0038/yifyang/scripts/models/Llama3.2-3B-ShiningValiant2', '/zju_0038/yifyang/scripts/models/FineMath-Llama-3B', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B_MATH_lisa', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_normal', '/zju_0038/yifyang/scripts/models/GSM8K-Binary_Llama-3.2-3B-ihl420el', '/zju_0038/yifyang/scripts/models/saves_llama3.2_3b_origianl_MATH_training_rewrite_common_shorter_8b', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-SFT', '/zju_0038/yifyang/scripts/models/Llama-3.2-3B-math-R3'] Scaling coefficient is 0.1 Merging conducted on cpu Loading base model in offline mode... Loading checkpoint shards: 0%| | 0/2 [00:00= 1.5 and < 2.0 but detected 2.3  [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible