Spaces:

Boldt
/

README

Running

App Files Files Community

README / README.md

aynetdia

Update README.md

b28fb80 verified 2 days ago

preview code

raw

history blame contribute delete

1.89 kB

	---
	title: README
	emoji: 📚
	colorFrom: pink
	colorTo: green
	sdk: static
	pinned: false
	---

	Welcome to Boldt!

	Boldt is a family of German language models developed by the Chair of Machine Learning @ Humboldt-Universität zu Berlin. This organization hosts our models, datasets, and research artifacts related to the Boldt project.

	Feel free to explore, download, and experiment with our latest releases! 🚀

	## 🌟 The Boldt Model Family

	Our models are trained on our German Dense-Core subset of FineWeb-2, utilizing a multi-epoch training recipe on high-quality data.

	\| Model \| Parameters \| Context Window \| Description \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| [Boldt-DC-350M](https://huggingface.co/Boldt/Boldt-DC-350M) \| 350M \| 2048 \| Ultra-lightweight base model for constrained environments. \|
	\| [Boldt-DC-1B](https://huggingface.co/Boldt/Boldt-DC-1B) \| 1B \| 2048 \| Highly optimized 1B base model with top-tier German performance. \|
	\| [Boldt-1B](https://huggingface.co/Boldt/Boldt-1B) \| 1B \| 4096 \| Extended context and augmented with 6B tokens of high-quality German news data. \|
	\| [Boldt-1B-IT-Preview](https://huggingface.co/Boldt/Boldt-1B-IT-Preview) \| 1B \| 4096 \| Experimental instruction-tuned model. \|

	## 📊 Comparison

	Boldt-1B compares favorably on German LLM benchmarks against other similarly-sized models:

	![Boldt-1B Performance Comparison](https://huggingface.co/Boldt/Boldt-1B/resolve/main/boldt_1b_evaluation.png)

	It is even competitive with many larger (2B parameter) models. See our paper for the full evaluation.

	## 📖 Research & Artifacts
	* Paper: [Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling (arXiv 2026)](https://arxiv.org/abs/2604.28075)
	* Evaluation Suite: [Modernized German Benchmarks](https://huggingface.co/collections/Boldt/german-llm-benchmarks)