AraFusion โ€” Arabic Masked Diffusion Language Model

Trained with MDLM on FineWeb-2 Arabic subsets. Supports dialect-conditioned generation via Classifier-Free Guidance (CFG).

Model details

Architecture DiT (BERT-base scale: 12L / 12H / 768d)
Vocab 96 000 (MorphBPE, see AraFusion/arafusion-morphBPE)
Sequence length 512 tokens
Diffusion Linear noise schedule, 1 000 sampling steps
Dialects MSA ([MSA]), Najdi ([NAJDI]), Egyptian ([EGYPT])
CFG p_uncond 0.10
Training steps 200 000 (pre-train)
Precision BF16

Data

Trained on three FineWeb-2 Arabic subsets:

Subset Dialect Split
arb_Arab Modern Standard Arabic train 10 %
ars_Arab Najdi / Saudi Arabic train (full)
arz_Arab Egyptian Arabic train (full)

See AraFusion/arafusion-arabic-raw for the raw text and AraFusion/arafusion-arabic-packed for the packed training sequences.

Checkpoints

Training job: morphbpe-641m-51498

Path in repo
checkpoints/0-1000-v1.ckpt
checkpoints/0-1000-v2.ckpt
checkpoints/0-1000-v3.ckpt
checkpoints/0-1000.ckpt
checkpoints/0-10000-v1.ckpt
checkpoints/0-10000-v2.ckpt
checkpoints/0-47000.ckpt
checkpoints/1-88000.ckpt
checkpoints/1-90000.ckpt
checkpoints/1-92000.ckpt
checkpoints/1-94000.ckpt
checkpoints/2-100000.ckpt
checkpoints/2-96000.ckpt
checkpoints/2-98000.ckpt
checkpoints/3-102000.ckpt
checkpoints/3-104000.ckpt
checkpoints/3-106000.ckpt
checkpoints/3-108000.ckpt
checkpoints/3-110000.ckpt
checkpoints/3-112000.ckpt
checkpoints/3-114000.ckpt
checkpoints/3-116000.ckpt
checkpoints/3-118000.ckpt
checkpoints/3-120000.ckpt
checkpoints/3-122000.ckpt
checkpoints/3-124000.ckpt
checkpoints/3-126000.ckpt
checkpoints/3-128000.ckpt
checkpoints/3-130000.ckpt
checkpoints/3-132000.ckpt
checkpoints/3-134000.ckpt
checkpoints/3-136000.ckpt
checkpoints/3-138000.ckpt
checkpoints/3-140000.ckpt
checkpoints/3-142000.ckpt
checkpoints/4-156000.ckpt
checkpoints/4-158000.ckpt
checkpoints/4-160000.ckpt
checkpoints/4-162000.ckpt
checkpoints/4-164000.ckpt
checkpoints/4-166000.ckpt
checkpoints/4-168000.ckpt
checkpoints/4-170000.ckpt
checkpoints/4-172000.ckpt
checkpoints/4-174000.ckpt
checkpoints/4-176000.ckpt
checkpoints/4-178000.ckpt
checkpoints/5-180000.ckpt
checkpoints/5-182000.ckpt
checkpoints/5-184000.ckpt
checkpoints/5-186000.ckpt
checkpoints/5-188000.ckpt
checkpoints/5-190000.ckpt
checkpoints/5-192000.ckpt
checkpoints/5-194000.ckpt
checkpoints/5-196000.ckpt
checkpoints/5-198000.ckpt
checkpoints/best.ckpt
checkpoints/last-v1.ckpt
checkpoints/last.ckpt

Usage

# Requires the MDLM library from https://github.com/kuleshov-group/mdlm
from transformers import PreTrainedTokenizerFast
tok = PreTrainedTokenizerFast.from_pretrained("AraFusion/arafusion-morphBPE")

# Load checkpoint and run a sampling step โ€” see MDLM docs for full API.

Citation

@misc{arafusion2026,
  title   = {AraFusion: Dialect-Conditioned Arabic Masked Diffusion Language Model},
  year    = {2026},
  url     = {https://huggingface.co/AraFusion}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support