[Dataset] Pretrain-corpus
updated
Viewer
• Updated • 69.9k • 207k
• 390
EssentialAI/essential-web-v1.0
Preview
• Updated • 41.4k
• 223
Viewer
• Updated • 52.5B • 634k
• 2.76k
HuggingFaceFW/fineweb-edu
Viewer
• Updated • 3.5B • 358k
• 1.03k
Viewer
• Updated • 4.48B • 96.3k
• 784
data-is-better-together/fineweb-c
Viewer
• Updated • 88.7k • 1.2k
• 60
Viewer
• Updated • 170M • 14.3k
• 91
Updated • 2.81k
• 1.02k
Viewer
• Updated • 621M • 14.6k
• 87
mlfoundations/dclm-baseline-1.0
Preview
• Updated • 125k
• 262
Preview
• Updated • 34.2k
• 90