ExOPD-Arabic: Extrapolated On-Policy Distillation for Arabic
Paper: ExOPD-Arabic: Extrapolated On-Policy Distillation from a 235B Teacher to a Specialized 8B Arabic Student
Authors: Mark Kashirskiy, Artiom Lipinski, Ilya Makarov
Description
ExOPD-Arabic is a LoRA adapter for Qwen3-8B-Base, trained via Extrapolated On-Policy Distillation (ExOPD) with λ=1.25 reward extrapolation from Qwen3-235B-A22B teacher. Achieves strong Arabic performance through 3-round iterative on-policy preference optimization.
Usage
Evaluation (OALL Suite)
See paper for full benchmark results.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support