Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
vishesh-t27
's Collections
Full Pre-Training Datasets
Pre-Training Datasets Mixtures
Pre-Training Datasets Mixtures
updated
Jan 31
Upvote
-
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
Jul 11, 2025
•
3.5B
•
353k
•
1.03k
bigcode/the-stack-dedup
Viewer
•
Updated
Aug 17, 2023
•
237M
•
16.8k
•
392
open-web-math/open-web-math
Viewer
•
Updated
Oct 17, 2023
•
6.32M
•
19.2k
•
333
HuggingFaceTB/stack-edu
Viewer
•
Updated
Mar 20, 2025
•
167M
•
2.33k
•
68
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
Jul 22, 2024
•
116k
•
263
allenai/c4
Viewer
•
Updated
Jan 9, 2024
•
10.4B
•
680k
•
554
HuggingFaceTB/finemath
Viewer
•
Updated
Feb 6, 2025
•
48.3M
•
15.7k
•
359
uonlp/CulturaX
Viewer
•
Updated
Dec 16, 2024
•
7.18B
•
21.8k
•
611
allenai/dolma3_dolmino_mix-100B-1025
Viewer
•
Updated
Jan 5
•
14.1M
•
6.53k
•
9
Upvote
-
Share collection
View history
Collection guide
Browse collections