Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 32
This is a DARE-TIES merge reproduction of Llama3-8B-Instruct + NousResearch/Hermes-2-Pro-Llama-3-8B + aaditya/Llama3-OpenBioLLM-8B.
The overall merge recipe and benchmark setup follow lighteternal/Llama3-merge-biomed-8b, while the actual merge implementation is performed with MindNLP Wizard on MindSpore/Ascend.
bfloat16Prompt template recommendation remains the Llama3 format: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/
| Task | Metric | Ours (Wizard, %) | Llama3-8B-Instruct (%) | OpenBioLLM-8B (%) |
|---|---|---|---|---|
| ARC Challenge | Accuracy | 59.73 | 57.17 | 55.38 |
| Normalized Accuracy | 64.59 | 60.75 | 58.62 | |
| HellaSwag | Accuracy | 62.26 | 62.59 | 61.83 |
| Normalized Accuracy | 81.35 | 81.53 | 80.76 | |
| Winogrande | Accuracy | 76.01 | 74.51 | 70.88 |
| GSM8K | Accuracy | 70.81 | 68.69 | 10.15 |
| MMLU-Anatomy | Accuracy | 71.11 | 72.59 | 69.62 |
| MMLU-Clinical Knowledge | Accuracy | 77.74 | 77.83 | 60.38 |
| MMLU-College Biology | Accuracy | 80.56 | 81.94 | 79.86 |
| MMLU-College Medicine | Accuracy | 68.21 | 63.58 | 70.52 |
| MMLU-Medical Genetics | Accuracy | 82.00 | 80.00 | 80.00 |
| MMLU-Professional Medicine | Accuracy | 77.57 | 71.69 | 77.94 |
This model is merged using the DARE-TIES method with meta-llama/Meta-Llama-3-8B-Instruct as base.
The following donor models are included in the merge:
The following YAML configuration is used:
models:
- model: meta-llama/Meta-Llama-3-8B-Instruct
# Base model providing a general foundation without specific parameters
- model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
density: 0.60
weight: 0.5
- model: NousResearch/Hermes-2-Pro-Llama-3-8B
parameters:
density: 0.55
weight: 0.1
- model: aaditya/Llama3-OpenBioLLM-8B
parameters:
density: 0.55
weight: 0.4
merge_method: dare_ties
base_model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
int8_mask: true
dtype: bfloat16