ExOPD-Arabic: Extrapolated On-Policy Distillation for Arabic

Paper: ExOPD-Arabic: Extrapolated On-Policy Distillation from a 235B Teacher to a Specialized 8B Arabic Student

Authors: Mark Kashirskiy, Artiom Lipinski, Ilya Makarov

Description

ExOPD-Arabic is a LoRA adapter for Qwen3-8B-Base, trained via Extrapolated On-Policy Distillation (ExOPD) with λ=1.25 reward extrapolation from Qwen3-235B-A22B teacher. Achieves strong Arabic performance through 3-round iterative on-policy preference optimization.

Usage

Evaluation (OALL Suite)

See paper for full benchmark results.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mariklolik228/ExOPD-Arabic-Qwen3-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1077)

this model