AraSteer: Activation Steering + GRPO for Arabic
Two LoRA adapters trained with Group Relative Policy Optimization (GRPO) on Qwen3-8B for Arabic language generation improvement.
Adapters
grpo_a/
- Method: Raw GRPO (200 steps, r=8, 21.8M params)
- Reward improvement: +4.8% relative over 200 steps
grpo_b/
- Method: CLAS-warm-started GRPO (200 steps, r=16, 43.6M params)
- CLAS config: alpha=1.25, top-4 Arabic-specific layers {34, 33, 32, 0}
- Reward improvement: +9.1% relative, +15.9% faster convergence vs GRPO-A at step 50
Usage
Paper
AraSteer: Bimodal Neuron Specialization and Activation Steering for Arabic in Multilingual LLMs
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support