view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 • 65
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published Jan 15 • 48
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5, 2025 • 61
Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions Paper • 2410.17655 • Published Oct 23, 2024 • 6