Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 18
This is a merge of pre-trained language models created using mergekit.
So far of the Franken merges, this one does very well using the Min-P and Noromaid settings in SillyTavern 2. This one seems even better then the 10.5B version of this model. I uploaded 3 files for SillyTavern that can be imported. I take no credit for these files, not sure who original authors are.
This model was merged using the TIES merge method using Franken-Maid as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: ibm/merlinite-7b
parameters:
weight: 1
density: 1
- model: Undi95/Toppy-M-7B
parameters:
weight: 0.3
- model: jondurbin/bagel-dpo-7b-v0.4
parameters:
weight: 0.2
- model: senseable/WestLake-7B-v2
parameters:
weight: 0.2
- model: l3utterfly/mistral-7b-v0.1-layla-v4
parameters:
weight: 0.2
merge_method: ties
base_model: Franken-Maid
parameters:
density: 0.4
int8_mask: true
normalize: true
dtype: bfloat16
models:
- model: SanjiWatsuki/Sonya-7B
parameters:
weight: 1
density: 1
- model: SanjiWatsuki/Loyal-Toppy-Bruins-Maid-7B-DARE
parameters:
weight: 0.3
- model: Azazelle/Half-NSFW_Noromaid-7b
parameters:
weight: 0.2
- model: senseable/WestLake-7B-v2
parameters:
weight: 0.2
- model: l3utterfly/mistral-7b-v0.1-layla-v4
parameters:
weight: 0.2
merge_method: ties
base_model: Weyaxi/OpenHermes-2.5-neural-chat-7b-v3-1-7B
parameters:
density: 0.4
int8_mask: true
normalize: true
dtype: bfloat16