REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published • 104
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published • 302
Towards Best Practices for Open Datasets for LLM Training
Paper
• 2501.08365
• Published • 62
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published • 72
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 258
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building
a Chinese-Centric LLM
Paper
• 2502.06635
• Published • 6
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
• 2503.00808
• Published • 57
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
• 2503.22230
• Published • 45
WorldPM: Scaling Human Preference Modeling
Paper
• 2505.10527
• Published • 34
Paper
• 2505.09388
• Published • 339
Model Merging in Pre-training of Large Language Models
Paper
• 2505.12082
• Published • 39
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published • 88