Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.16982

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 10
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Paper • 2307.02047 • Published Jul 5, 2023 • 2
Practical Efficiency of Muon for Pretraining

Paper • 2505.02222 • Published May 4, 2025 • 41
AdaMuon: Adaptive Muon Optimizer

Paper • 2507.11005 • Published Jul 15, 2025 • 2
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published Jan 30, 2025 • 29
DiLoCo: Distributed Low-Communication Training of Language Models

Paper • 2311.08105 • Published Nov 14, 2023 • 16
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Paper • 2503.09799 • Published Mar 12, 2025 • 15
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102

Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer

moonshotai/Moonlight-16B-A3B-Instruct

Text Generation • 16B • Updated Jan 30 • 72.8k • 194
moonshotai/Moonlight-16B-A3B

Text Generation • 16B • Updated Jan 30 • 72.7k • 112
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 10
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Paper • 2307.02047 • Published Jul 5, 2023 • 2
Practical Efficiency of Muon for Pretraining

Paper • 2505.02222 • Published May 4, 2025 • 41
AdaMuon: Adaptive Muon Optimizer

Paper • 2507.11005 • Published Jul 15, 2025 • 2
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer

moonshotai/Moonlight-16B-A3B-Instruct

Text Generation • 16B • Updated Jan 30 • 72.8k • 194
moonshotai/Moonlight-16B-A3B

Text Generation • 16B • Updated Jan 30 • 72.7k • 112
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published Jan 30, 2025 • 29
DiLoCo: Distributed Low-Communication Training of Language Models

Paper • 2311.08105 • Published Nov 14, 2023 • 16
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Paper • 2503.09799 • Published Mar 12, 2025 • 15
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs