ModernBERT-Diffusion-Pretrained-20260119

A ModernBERT-large model pretrained as a diffusion language model on high-quality web text.

Model Description

This model extends ModernBERT with diffusion-style training:

Variable masking ratio: 15-80% of tokens masked per sample
Parallel prediction: All masked tokens predicted simultaneously
Iterative refinement: Generate text by progressively unmasking

Training Details

Parameter	Value
Base model	answerdotai/ModernBERT-large
Training data	Dolma3 Common Crawl (2M high-quality samples)
Training steps	5000
Batch size	16 (effective)
Max sequence length	8,192 tokens
Masking ratio	15-80% (variable)
Hardware	H100 80GB

Evaluation

Perplexity: 2.89 (at 15% masking ratio)
Reference: Well-trained MLM typically achieves 10-50 perplexity

Usage

As a Masked Language Model

from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline

model_id = "Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119"
fill_mask = pipeline("fill-mask", model=model_id)

# Single mask prediction
result = fill_mask("The capital of France is [MASK].")
print(result[0]['token_str'])  # Paris

For Diffusion-Style Generation

For text generation via iterative unmasking, fine-tune on instruction data first.

Intended Use

This is a pretrained checkpoint intended as a foundation for:

Instruction fine-tuning (SFT)
Domain adaptation
Research into diffusion language models

Limitations

Pretrained only - not optimized for instruction following
Best results with fine-tuning on downstream tasks
Limited to 8,192 token context

Citation

@misc{modernbert-diffusion,
  author = {Ayush Nangia},
  title = {ModernBERT Diffusion Language Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119}
}

Downloads last month: 8

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for Ayushnangia/ModernBERT-Diffusion-Pretrained-20260119

Base model

answerdotai/ModernBERT-large

Finetuned

(262)

this model

Ayushnangia
/

ModernBERT-Diffusion-Pretrained-20260119