AraFusion โ Arabic Masked Diffusion Language Model
Trained with MDLM on FineWeb-2 Arabic subsets. Supports dialect-conditioned generation via Classifier-Free Guidance (CFG).
Model details
| Architecture | DiT (BERT-base scale: 12L / 12H / 768d) |
| Vocab | 96 000 (MorphBPE, see AraFusion/arafusion-morphBPE) |
| Sequence length | 512 tokens |
| Diffusion | Linear noise schedule, 1 000 sampling steps |
| Dialects | MSA ([MSA]), Najdi ([NAJDI]), Egyptian ([EGYPT]) |
| CFG p_uncond | 0.10 |
| Training steps | 200 000 (pre-train) |
| Precision | BF16 |
Data
Trained on three FineWeb-2 Arabic subsets:
| Subset | Dialect | Split |
|---|---|---|
arb_Arab |
Modern Standard Arabic | train 10 % |
ars_Arab |
Najdi / Saudi Arabic | train (full) |
arz_Arab |
Egyptian Arabic | train (full) |
See AraFusion/arafusion-arabic-raw for the raw text and
AraFusion/arafusion-arabic-packed for the packed training sequences.
Checkpoints
Training job: morphbpe-641m-51498
| Path in repo |
|---|
checkpoints/0-1000-v1.ckpt |
checkpoints/0-1000-v2.ckpt |
checkpoints/0-1000-v3.ckpt |
checkpoints/0-1000.ckpt |
checkpoints/0-10000-v1.ckpt |
checkpoints/0-10000-v2.ckpt |
checkpoints/0-47000.ckpt |
checkpoints/1-88000.ckpt |
checkpoints/1-90000.ckpt |
checkpoints/1-92000.ckpt |
checkpoints/1-94000.ckpt |
checkpoints/2-100000.ckpt |
checkpoints/2-96000.ckpt |
checkpoints/2-98000.ckpt |
checkpoints/3-102000.ckpt |
checkpoints/3-104000.ckpt |
checkpoints/3-106000.ckpt |
checkpoints/3-108000.ckpt |
checkpoints/3-110000.ckpt |
checkpoints/3-112000.ckpt |
checkpoints/3-114000.ckpt |
checkpoints/3-116000.ckpt |
checkpoints/3-118000.ckpt |
checkpoints/3-120000.ckpt |
checkpoints/3-122000.ckpt |
checkpoints/3-124000.ckpt |
checkpoints/3-126000.ckpt |
checkpoints/3-128000.ckpt |
checkpoints/3-130000.ckpt |
checkpoints/3-132000.ckpt |
checkpoints/3-134000.ckpt |
checkpoints/3-136000.ckpt |
checkpoints/3-138000.ckpt |
checkpoints/3-140000.ckpt |
checkpoints/3-142000.ckpt |
checkpoints/4-156000.ckpt |
checkpoints/4-158000.ckpt |
checkpoints/4-160000.ckpt |
checkpoints/4-162000.ckpt |
checkpoints/4-164000.ckpt |
checkpoints/4-166000.ckpt |
checkpoints/4-168000.ckpt |
checkpoints/4-170000.ckpt |
checkpoints/4-172000.ckpt |
checkpoints/4-174000.ckpt |
checkpoints/4-176000.ckpt |
checkpoints/4-178000.ckpt |
checkpoints/5-180000.ckpt |
checkpoints/5-182000.ckpt |
checkpoints/5-184000.ckpt |
checkpoints/5-186000.ckpt |
checkpoints/5-188000.ckpt |
checkpoints/5-190000.ckpt |
checkpoints/5-192000.ckpt |
checkpoints/5-194000.ckpt |
checkpoints/5-196000.ckpt |
checkpoints/5-198000.ckpt |
checkpoints/best.ckpt |
checkpoints/last-v1.ckpt |
checkpoints/last.ckpt |
Usage
# Requires the MDLM library from https://github.com/kuleshov-group/mdlm
from transformers import PreTrainedTokenizerFast
tok = PreTrainedTokenizerFast.from_pretrained("AraFusion/arafusion-morphBPE")
# Load checkpoint and run a sampling step โ see MDLM docs for full API.
Citation
@misc{arafusion2026,
title = {AraFusion: Dialect-Conditioned Arabic Masked Diffusion Language Model},
year = {2026},
url = {https://huggingface.co/AraFusion}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support