ModernBERT-Diffusion-Pretrained-20260119

A ModernBERT-large model pretrained as a diffusion language model on high-quality web text.

Model Description

This model extends ModernBERT with diffusion-style training:

  • Variable masking ratio: 15-80% of tokens masked per sample
  • Parallel prediction: All masked tokens predicted simultaneously
  • Iterative refinement: Generate text by progressively unmasking

Training Details

Parameter Value
Base model answerdotai/ModernBERT-large
Training data Dolma3 Common Crawl (2M high-quality samples)
Training steps 5000
Batch size 16 (effective)
Max sequence length 8,192 tokens
Masking ratio 15-80% (variable)
Hardware H100 80GB

Evaluation

  • Perplexity: 2.89 (at 15% masking ratio)
  • Reference: Well-trained MLM typically achieves 10-50 perplexity

Usage

As a Masked Language Model

from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline

model_id = "Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119"
fill_mask = pipeline("fill-mask", model=model_id)

# Single mask prediction
result = fill_mask("The capital of France is [MASK].")
print(result[0]['token_str'])  # Paris

For Diffusion-Style Generation

For text generation via iterative unmasking, fine-tune on instruction data first.

Intended Use

This is a pretrained checkpoint intended as a foundation for:

  • Instruction fine-tuning (SFT)
  • Domain adaptation
  • Research into diffusion language models

Limitations

  • Pretrained only - not optimized for instruction following
  • Best results with fine-tuning on downstream tasks
  • Limited to 8,192 token context

Citation

@misc{modernbert-diffusion,
  author = {Ayush Nangia},
  title = {ModernBERT Diffusion Language Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119}
}
Downloads last month
8
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119

Finetuned
(262)
this model

Dataset used to train Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119