AraSwap: Layer-Level Capability Composition for Arabic LLM Adaptation

Three separately trained LoRA experts for Qwen3-8B, composed via layer-selective merging.

Adapters

Adapter Data LoRA r Steps
arabic_expert/ Arabic Wikipedia + Arabic MMLU 64 1000
math_expert/ GSM8K + MATH (English) 64 1000
cultural_expert/ ACVA + Arabic cultural subjects 32 1000

Method

Frobenius norm analysis of each expert's weight delta identifies layer specialization. Under a strict 15% dominance criterion, 4 layers are selected for swapping:

  • Layers 1, 9, 16: Arabic Language Expert
  • Layer 23: Arabic Cultural Expert
  • Remaining 32 layers: base model (reasoning preserved)

Results

AraSwap merged OALL avg: 62.1% (4-bit, 0-shot)

Key finding: AraTrust 85.6% > baseline 85.2% (+0.4pp).

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "mariklolik228/AraSwap-Qwen3-8B-Adapters", subfolder="arabic_expert")

Citation

@article{kashirskiy2026araswap,
  title={AraSwap: Layer-Level Capability Composition for Arabic LLM Adaptation},
  author={Kashirskiy, Mark and Lipinski, Artiom and Makarov, Ilya},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mariklolik228/AraSwap-Qwen3-8B-Adapters

Finetuned
Qwen/Qwen3-8B
Adapter
(1071)
this model