gradient_slerp_tongyi_base_attn_0.6_mlp_0.4

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

/home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
/home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch

Configuration

The following YAML configuration was used to produce this model:


# models:
#   - model: /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
#   - model: /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch
# merge_method: slerp
# base_model: /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
# parameters:
#   t:
#     - filter: self_attn
#       value: [0, 0.5, 0.3, 0.7, 1]
#     - filter: mlp
#       value: [1, 0.5, 0.7, 0.3, 0]
#     - value: 0.5 # fallback for rest of tensors
# dtype: float16

models:
      - model: /home/fractal_admin/shreyas/Reasoning/models/FV-30B-A3B
      - model: /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch

merge_method: slerp
base_model: /home/fractal_admin/shreyas/Reasoning/models/tongyi-deepresearch
parameters:
  t:
    # Attention = keep more DeepResearch (reasoning patterns live here)
    - filter: self_attn
      value: 0.8

    # MLP = keep more HealthBench (knowledge + domain tuning)
    - filter: mlp
      value: 0.2

    # fallback
    - value: 0.5

dtype: bfloat16


# mergekit-yaml \
# /home/fractal_admin/shreyas/Reasoning/mergekit/examples/gradient-slerp.yml \
# /home/fractal_admin/shreyas/Reasoning/mergekit/merges/gradient_slerp_tongyi_base_attn_0.6_mlp_0.4 \
# --cuda --lazy-unpickle \
# --allow-crimes

Downloads last month: 4

Safetensors

Model size

31B params

Tensor type

BF16