gradient_slerp_tongyi_base_attn_0.6_mlp_0.4
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the SLERP merge method.
Models Merged
The following models were included in the merge:
- /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
- /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch
Configuration
The following YAML configuration was used to produce this model:
# models:
# - model: /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
# - model: /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch
# merge_method: slerp
# base_model: /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
# parameters:
# t:
# - filter: self_attn
# value: [0, 0.5, 0.3, 0.7, 1]
# - filter: mlp
# value: [1, 0.5, 0.7, 0.3, 0]
# - value: 0.5 # fallback for rest of tensors
# dtype: float16
models:
- model: /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
- model: /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch
merge_method: slerp
base_model: /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch
parameters:
t:
# Attention = keep more DeepResearch (reasoning patterns live here)
- filter: self_attn
value: 0.8
# MLP = keep more HealthBench (knowledge + domain tuning)
- filter: mlp
value: 0.2
# fallback
- value: 0.5
dtype: bfloat16
# mergekit-yaml \
# /home/fractal_admin/shreyas/Reasoning/mergekit/examples/gradient-slerp.yml \
# /home/fractal_admin/shreyas/Reasoning/mergekit/merges/gradient_slerp_tongyi_base_attn_0.6_mlp_0.4 \
# --cuda --lazy-unpickle \
# --allow-crimes
- Downloads last month
- 4