view article Article Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens Mar 6 • 5