| --- |
| base_model: |
| - Qwen/Qwen2.5-Coder-7B |
| - Qwen/Qwen2.5-Math-7B |
| - Qwen/Qwen2.5-7B-Instruct |
| - Qwen/Qwen2.5-7B |
| library_name: transformers |
| tags: |
| - mergekit |
| - merge |
|
|
| --- |
| # nthehai01/Qwen2.5-7B-Instruct-Math-Code-task-arithmetic |
|
|
| This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
| ## Performance |
| | Metric |Value| |
| |---------------------------------|----:| |
| |GSM8k (zero-shot) |86.20| |
| |HellaSwag (zero-Shot) |49.91| |
| |MBPP (zero-shot) |55.20| |
|
|
| ## Merge Details |
| ### Merge Method |
|
|
| This model was merged using the [Task Arithmetic](https://arxiv.org/abs/2212.04089) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base. |
|
|
| ### Models Merged |
|
|
| The following models were included in the merge: |
| * [Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) |
| * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) |
| * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
|
| ### Configuration |
|
|
| The following YAML configuration was used to produce this model: |
|
|
| ```yaml |
| base_model: Qwen/Qwen2.5-7B |
| dtype: bfloat16 |
| merge_method: task_arithmetic |
| parameters: |
| lambda: 0.5676097213578511 |
| normalize: 1.0 |
| slices: |
| - sources: |
| - layer_range: [0, 28] |
| model: Qwen/Qwen2.5-7B |
| - layer_range: [0, 28] |
| model: Qwen/Qwen2.5-Math-7B |
| parameters: |
| weight: 0.5215841338521604 |
| - layer_range: [0, 28] |
| model: Qwen/Qwen2.5-Coder-7B |
| parameters: |
| weight: 0.13680114132969845 |
| - layer_range: [0, 28] |
| model: Qwen/Qwen2.5-7B-Instruct |
| parameters: |
| weight: 0.8507353075455186 |
| ``` |
|
|