Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
Paper • 2312.06795 • Published • 2
This is a merge of pre-trained language models created using mergekit.
This model was merged using the Model Breadcrumbs with TIES merge method using Qwen/Qwen3-0.6B as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: Qwen/Qwen3-0.6B # Base
- model: suayptalha/Qwen3-0.6B-Code-Expert
parameters:
weight: 0.3 # General coding
- model: suayptalha/Qwen3-0.6B-Math-Expert
parameters:
weight: 0.3 # Math specialist
- model: Redhanuman/Shadow-0.7B
parameters:
weight: 0.35 # **BUMPED UP** - reasoning/CoT powerhouse
- model: yarin-shaked/Qwen3-Codeforces-GRPO
parameters:
weight: 0.25 # Competitive programming
merge_method: breadcrumbs_ties
base_model: Qwen/Qwen3-0.6B
parameters:
density: 0.6 # TIES trim
beta: 0.1 # Breadcrumbs outliers
alpha: 0.1 # Breadcrumbs negligible
t_ie: false # Skip TIES norm (specialists)
normalize: true # L2 normalize deltas
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer:
source: union # Safe vocab merge