Collections
Discover the best community collections!
Collections including paper arxiv:2306.13649
-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 48 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 25 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 102 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 55 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134
-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 48 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 25 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 102 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 55 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134