Pretraining Datasets wikimedia/wikipedia Viewer • Updated Jan 9, 2024 • 61.6M • 104k • 1.19k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 4.23k • 400 Skywork/SkyPile-150B Viewer • Updated Dec 7, 2023 • 1.76M • 27.7k • 404
Awesome Instruction Tuning Dataset Open-Orca/OpenOrca Viewer • Updated Feb 19, 2025 • 2.94M • 22k • 1.52k glaiveai/glaive-code-assistant Viewer • Updated Sep 27, 2023 • 136k • 516 • 100 silk-road/alpaca-data-gpt4-chinese Viewer • Updated May 23, 2023 • 52k • 1.03k • 103 anon8231489123/ShareGPT_Vicuna_unfiltered Updated Apr 12, 2023 • 144k • 855
Awesome Instruction Tuning Dataset Open-Orca/OpenOrca Viewer • Updated Feb 19, 2025 • 2.94M • 22k • 1.52k glaiveai/glaive-code-assistant Viewer • Updated Sep 27, 2023 • 136k • 516 • 100 silk-road/alpaca-data-gpt4-chinese Viewer • Updated May 23, 2023 • 52k • 1.03k • 103 anon8231489123/ShareGPT_Vicuna_unfiltered Updated Apr 12, 2023 • 144k • 855
Pretraining Datasets wikimedia/wikipedia Viewer • Updated Jan 9, 2024 • 61.6M • 104k • 1.19k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 4.23k • 400 Skywork/SkyPile-150B Viewer • Updated Dec 7, 2023 • 1.76M • 27.7k • 404