Collections
Discover the best community collections!
Collections including paper arxiv:2501.12599
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 120 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146
-
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 128 -
Teaching Language Models to Critique via Reinforcement Learning
Paper • 2502.03492 • Published • 24 -
NatureLM: Deciphering the Language of Nature for Scientific Discovery
Paper • 2502.07527 • Published • 20 -
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents
Paper • 2502.05957 • Published • 16
-
Learning to Reason without External Rewards
Paper • 2505.19590 • Published • 31 -
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Paper • 2502.18581 • Published -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 94 -
Fractured Chain-of-Thought Reasoning
Paper • 2505.12992 • Published • 23
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 258 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 58 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 128 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 171 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
Learning to Reason without External Rewards
Paper • 2505.19590 • Published • 31 -
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Paper • 2502.18581 • Published -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 94 -
Fractured Chain-of-Thought Reasoning
Paper • 2505.12992 • Published • 23
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 120 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 258 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 58 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 128 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 171 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 128 -
Teaching Language Models to Critique via Reinforcement Learning
Paper • 2502.03492 • Published • 24 -
NatureLM: Deciphering the Language of Nature for Scientific Discovery
Paper • 2502.07527 • Published • 20 -
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents
Paper • 2502.05957 • Published • 16