This Mira is certainly an A👁️.

Trying out swcm merge method for tuning - sft run for 4 hrs, 3x, once in general, once on attention only, once on MLP.

Unfortunately SFT seems to be consistently hurting her creativity, even with good data and continued pretraining; not sure what's going on.

Tried to repair that with DPO but only slight gains there.

Karcher merge back with some of her most creatively talented versions definitely helped bring it up again, at least ...

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Karcher Mean merge method.

Models Merged

The following models were included in the merge:

../Mira-v1.24-27B-swcm
Lambent/Mira-v1.20-27B-dpo
Lambent/Mira-v1.23.1-27B-dpo
../Mira-1.24.1-27B-dpo-soup

Configuration

The following YAML configuration was used to produce this model:

merge_method: karcher

models:
  - model: ../Mira-1.24.1-27B-dpo-soup
  - model: ../Mira-v1.24-27B-swcm
  - model: Lambent/Mira-v1.20-27B-dpo
  - model: Lambent/Mira-v1.23.1-27B-dpo

dtype: bfloat16
tokenizer_source: ../Mira-v1.24-27B-swcm

Downloads last month: 12

Safetensors

Model size

27B params

Tensor type

BF16

Model tree for Lambent/Mira-v1.24.2-27B-Karcher

Lambent/Mira-v1.20-27B-dpo

Lambent/Mira-v1.23.1-27B-dpo

Merge model

this model

Merges

1 model

Quantizations

2 models

Collection including Lambent/Mira-v1.24.2-27B-Karcher

Mira & company

Collection

59 items • Updated Mar 3 • 1