AraSwap: Layer-Level Capability Composition for Arabic LLM Adaptation
Three separately trained LoRA experts for Qwen3-8B, composed via layer-selective merging.
Adapters
| Adapter | Data | LoRA r | Steps |
|---|---|---|---|
arabic_expert/ |
Arabic Wikipedia + Arabic MMLU | 64 | 1000 |
math_expert/ |
GSM8K + MATH (English) | 64 | 1000 |
cultural_expert/ |
ACVA + Arabic cultural subjects | 32 | 1000 |
Method
Frobenius norm analysis of each expert's weight delta identifies layer specialization. Under a strict 15% dominance criterion, 4 layers are selected for swapping:
- Layers 1, 9, 16: Arabic Language Expert
- Layer 23: Arabic Cultural Expert
- Remaining 32 layers: base model (reasoning preserved)
Results
AraSwap merged OALL avg: 62.1% (4-bit, 0-shot)
Key finding: AraTrust 85.6% > baseline 85.2% (+0.4pp).
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "mariklolik228/AraSwap-Qwen3-8B-Adapters", subfolder="arabic_expert")
Citation
@article{kashirskiy2026araswap,
title={AraSwap: Layer-Level Capability Composition for Arabic LLM Adaptation},
author={Kashirskiy, Mark and Lipinski, Artiom and Makarov, Ilya},
year={2026}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support