Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper โข 2311.03099 โข Published โข 32
Use default tokenizer
๊ฐ ๋จ๊ณ๋ณ ํ๋ จ ํ, embed_tokens์ lm_head๋ ๋ณํฉ
phase
hyperparameter test: rank=16 lr=1e-5 weight_decay=0.01 drop_out=0.1
dataset: richard-park/llama-recipe-pre1-fineweb-edu-1m-split-text
llama-3.1 Base Model llama recipe 1 ํ๋ จ
hyperparameter ๋ณ๊ฒฝ: lr= 1e-5->2e-6, epoch: 3->4, lora dropout: 0.1 -> 0.3
dataset: richard-park/llama-recipe-pre2-fineweb-edu-1m-split-text
llama-3.1 Base Model llama recipe 2 ํ๋ จ
nohup tensorboard --logdir=sapie-fineweb-edu/outputs/pre2/llama31-base-1m-fineweb-edu_20250128-01/runs --host 0.0.0.0 --port=5406 > tensorboard.log 2>&1 & disown
hyperparameter: rank=16/32, lr=1e-5, weight_decay=0.1 drop_out=0.3
dataset: richard-park/sapie-dataset-pre3-1m-gt50-le256-split
llama-3.1 Base Model llama recipe 3 ํ๋ จ
nohup tensorboard --logdir=sapie-fineweb-edu/outputs/pre3/llama31-base-1m-aihub-trans_20250130-01/runs --host 0.0.0.0 --port=5406 > tensorboard.log 2>&1 & disown
This is a merge of pre-trained language models created using mergekit.
This model was merged using the DARE TIES merge method using ../models/Llama-3.1-8B-Instruct as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: ../models/Llama-3.1-8B-Instruct
parameters:
density: [0.6, 0.8, 1] # ํ์โ์ค๊ฐโ์์ ์, ์์ ๋ณด์กด
weight:
- filter: mlp
value: 0.8 # MLP ๋ ์ด์ด์ ๋ ํฐ ๊ฐ์ค์น (์ถ๋ ฅ์ ๊ธฐ์ฌ)
- value: 0.5 # ๋๋จธ์ง ๋ ์ด์ด
- model: ../models/llama31-base-pre3-finweb-edu-1m # base model + ํ๊ตญ์ด pretrain
parameters:
density: [1, 0.6, 0.4] # ํ์์ ๋ ํฐ ์ํฅ์ ์ฃผ๋๋ก ์ค์
weight:
- filter: attention
value: 0.7 # Attention ๋ ์ด์ด์ ๋ ํฐ ๊ฐ์ค์น (ํ๊ตญ์ด ํ์ต ๋ณด๊ฐ)
- value: 0.3 # ๋๋จธ์ง ๋ ์ด์ด
merge_method: dare_ties
base_model: ../models/Llama-3.1-8B-Instruct
dtype: bfloat16
- {'answer_relevancy': 0.7495, 'faithfulness': 0.6831}
- {'answer_relevancy': 0.7406, 'faithfulness': 0.7337}
- {'answer_relevancy': 0.7356, 'faithfulness': 0.6814}