AraSteer: Activation Steering + GRPO for Arabic

Two LoRA adapters trained with Group Relative Policy Optimization (GRPO) on Qwen3-8B for Arabic language generation improvement.

Adapters

Method: CLAS-warm-started GRPO (200 steps, r=16, 43.6M params)
CLAS config: alpha=1.25, top-4 Arabic-specific layers {34, 33, 32, 0}
Reward improvement: +9.1% relative, +15.9% faster convergence vs GRPO-A at step 50

AraSteer: Bimodal Neuron Specialization and Activation Steering for Arabic in Multilingual LLMs

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Adapter

this model