FineWeb: decanting the web for the finest text data at scale
π·
1.33k
Read a detailed overview of the FineWeb webβscale text dataset
Embedding Leaderboard
Convert images and queries into structured document text
FireRed / Nanonets / Monkey / Thyme / Typhoon / SmolDocling