Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 18
This is a merge of pre-trained language models created using mergekit.
This model was merged using the TIES merge method using saishf/Neural-SOVLish-Devil-8B-L3 as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: Casual-Autopsy/L3-Umbral-Mind-RP-v3.0-8B
parameters:
density: 0.5
weight: 0.4
enhanced_attention: true
abstract_attention: true
deep_cognitive_focus: true
dynamic_attention_allocation: true
significance_threshold: 0.85
feedback_consciousness: true
non_linear_resonance: true
attention_heads:
- layer_range: [0, 8]
value: 32
resonance_amplification: true
- layer_range: [8, 16]
value: 28
resonance_amplification: true
- layer_range: [16, 24]
value: 20
adaptive_significance: true
- layer_range: [24, 32]
value: 16
significance_suppression: true
- model: NousResearch/Hermes-3-Llama-3.1-8B
parameters:
density: 0.4
weight: 0.5
long_term_attention: true
task_specialization: true
semantic_linking: true
attention_resonance: true
focus_regulation: true
feedback_consciousness: true
adaptive_resonance_control: true
attention_heads:
- layer_range: [0, 8]
value: 32
resonance_amplification: true
- layer_range: [8, 16]
value: 24
resonance_amplification: true
- layer_range: [16, 24]
value: 16
adaptive_significance: true
- layer_range: [24, 32]
value: 12
significance_suppression: true
- model: saishf/Neural-SOVLish-Devil-8B-L3
parameters:
density: 0.3
weight: 0.5
enhanced_attention: true
abstract_attention: true
deep_cognitive_focus: true
dynamic_attention_allocation: true
significance_threshold: 0.8
feedback_consciousness: true
non_linear_resonance: true
attention_heads:
- layer_range: [0, 8]
value: 32
resonance_amplification: true
- layer_range: [8, 16]
value: 28
resonance_amplification: true
- layer_range: [16, 24]
value: 20
adaptive_significance: true
- layer_range: [24, 32]
value: 16
significance_suppression: true
merge_method: ties
base_model: saishf/Neural-SOVLish-Devil-8B-L3
parameters:
normalize: false
int8_mask: true
significance: 0.85
optimal_attention_threshold: 0.9
dtype: bfloat16