AraSwap: Layer-Level Capability Composition for Arabic LLM Adaptation

Three separately trained LoRA experts for Qwen3-8B, composed via layer-selective merging.

Adapters

Adapter	Data	LoRA r	Steps
`arabic_expert/`	Arabic Wikipedia + Arabic MMLU	64	1000
`math_expert/`	GSM8K + MATH (English)	64	1000
`cultural_expert/`	ACVA + Arabic cultural subjects	32	1000

Method

Frobenius norm analysis of each expert's weight delta identifies layer specialization. Under a strict 15% dominance criterion, 4 layers are selected for swapping:

Layers 1, 9, 16: Arabic Language Expert
Layer 23: Arabic Cultural Expert
Remaining 32 layers: base model (reasoning preserved)

Results

AraSwap merged OALL avg: 62.1% (4-bit, 0-shot)

Key finding: AraTrust 85.6% > baseline 85.2% (+0.4pp).

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "mariklolik228/AraSwap-Qwen3-8B-Adapters", subfolder="arabic_expert")

Citation

@article{kashirskiy2026araswap,
  title={AraSwap: Layer-Level Capability Composition for Arabic LLM Adaptation},
  author={Kashirskiy, Mark and Lipinski, Artiom and Makarov, Ilya},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mariklolik228/AraSwap-Qwen3-8B-Adapters

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1071)

this model