Running on CPU Upgrade 219 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens π 219 Explore synthetic data experiments on a virtual bookshelf
Running Featured 47 Porting nanochat to Transformers: an AI modeling history lesson π 47 Learn about ML and Transformers through nanochat
Running 91 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks π 91 Evaluate multilingual models using FineTasks
Running 221 FineVision: Open Data is All You Need π 221 A new open-source dataset for training VLMs