Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
vishesh-t27 's Collections
Full Pre-Training Datasets
Pre-Training Datasets Mixtures

Pre-Training Datasets Mixtures

updated Jan 31
Upvote
-

  • HuggingFaceFW/fineweb-edu

    Viewer • Updated Jul 11, 2025 • 3.5B • 353k • 1.03k

  • bigcode/the-stack-dedup

    Viewer • Updated Aug 17, 2023 • 237M • 16.8k • 392

  • open-web-math/open-web-math

    Viewer • Updated Oct 17, 2023 • 6.32M • 19.2k • 333

  • HuggingFaceTB/stack-edu

    Viewer • Updated Mar 20, 2025 • 167M • 2.33k • 68

  • mlfoundations/dclm-baseline-1.0

    Preview • Updated Jul 22, 2024 • 116k • 263

  • allenai/c4

    Viewer • Updated Jan 9, 2024 • 10.4B • 680k • 554

  • HuggingFaceTB/finemath

    Viewer • Updated Feb 6, 2025 • 48.3M • 15.7k • 359

  • uonlp/CulturaX

    Viewer • Updated Dec 16, 2024 • 7.18B • 21.8k • 611

  • allenai/dolma3_dolmino_mix-100B-1025

    Viewer • Updated Jan 5 • 14.1M • 6.53k • 9
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs