Dutch ModernBERT 1024h-22L (FineWeb2)

A ModernBERT model pretrained on FineWeb2 Dutch data. This model has 22 layers with 1024 hidden dimensions (230M parameters).

Model Details

  • Architecture: ModernBERT (Answer.AI/LightOn)
  • Layers: 22
  • Hidden size: 1024
  • Attention heads: 16
  • Intermediate size: 1536
  • Vocab size: 32,128
  • Parameters: ~230M
  • Tokenizer: yhavinga/dutch-llama-tokenizer (SentencePiece, Dutch-optimized)

Training

  • Dataset: FineWeb2 Dutch
  • Steps: 2,000,000
  • Precision: bfloat16
  • Framework: JAX/Flax

Usage

from transformers import AutoTokenizer, ModernBertForMaskedLM

model = ModernBertForMaskedLM.from_pretrained("yhavinga/dmbert-1024h-22l-fineweb2-2000000")
tokenizer = AutoTokenizer.from_pretrained("yhavinga/dmbert-1024h-22l-fineweb2-2000000")

# Masked language modeling
inputs = tokenizer("Amsterdam is de <mask> van Nederland.", return_tensors="pt")
outputs = model(**inputs)

Citation

@model{dmbert_1024h_fineweb2,
  title={Dutch ModernBERT 1024h-22L FineWeb2},
  author={Yeb Havinga},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/yhavinga/dmbert-1024h-22l-fineweb2-2000000}
}
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support