alanakbik commited on
Commit
3308b6d
·
verified ·
1 Parent(s): 05e66cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -12,3 +12,18 @@ pinned: false
12
  **Boldt** is a family of German language models developed by the **Chair of Machine Learning @ Humboldt-Universität zu Berlin**. This organization hosts our **models, datasets, and research artifacts** related to the Boldt project.
13
 
14
  Feel free to explore, download, and experiment with our latest releases! 🚀
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  **Boldt** is a family of German language models developed by the **Chair of Machine Learning @ Humboldt-Universität zu Berlin**. This organization hosts our **models, datasets, and research artifacts** related to the Boldt project.
13
 
14
  Feel free to explore, download, and experiment with our latest releases! 🚀
15
+
16
+ ## 🌟 The Boldt Model Family
17
+
18
+ Our models are trained on the German *Dense-Core* subset of FineWeb-2, utilizing a multi-epoch training recipe on high-quality data.
19
+
20
+ | Model | Parameters | Context Window | Description |
21
+ | :--- | :--- | :--- | :--- |
22
+ | [**Boldt-DC-350M**](https://huggingface.co/Boldt/Boldt-DC-350M) | 350M | 2048 | Ultra-lightweight base model for constrained environments. |
23
+ | [**Boldt-DC-1B**](https://huggingface.co/Boldt/Boldt-DC-1B) | 1B | 2048 | Highly optimized 1B base model with top-tier German performance. |
24
+ | [**Boldt-1B**](https://huggingface.co/Boldt/Boldt-1B) | 1B | 4096 | Extended context and vocabulary, augmented with 6B tokens of high-quality news. |
25
+ | [**Boldt-1B-IT-Preview**](https://huggingface.co/Boldt/Boldt-1B-IT-Preview) | 1B | 4096 | Instruction-tuned preview model for chat and zero-shot tasks. |
26
+
27
+ ## 📖 Research & Artifacts
28
+ * **Paper:** [Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling (arXiv 2026)](https://arxiv.org/abs/2604.28075)
29
+ * **Evaluation Suite:** [Modernized German Benchmarks](https://huggingface.co/collections/Boldt/german-llm-benchmarks)