Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated about 23 hours ago • 122
Data Science and Technology Towards AGI Part I: Tiered Data Management Paper • 2602.09003 • Published Feb 9 • 7
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 46
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22, 2025 • 64
RAVine: Reality-Aligned Evaluation for Agentic Search Paper • 2507.16725 • Published Jul 22, 2025 • 31
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 30 items • Updated 10 days ago • 84
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data Paper • 2505.05427 • Published May 8, 2025 • 5