paper chasing
updated
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published • 20
Evaluating Large Language Models Trained on Code
Paper
• 2107.03374
• Published • 10
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published • 24
Paper
• 2303.08774
• Published • 7
Paper
• 2410.21276
• Published • 87
Paper
• 2412.16720
• Published • 37
gpt-oss-120b & gpt-oss-20b Model Card
Paper
• 2508.10925
• Published • 18
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published • 78
Paper
• 2503.19786
• Published • 55
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 49
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published • 64
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
• 2507.06261
• Published • 67
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 251
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published • 118
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published • 55
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published • 61
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 447
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published • 25
DeepSeek-V3 Technical Report
Paper
• 2412.19437
• Published • 82
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper
• 2512.02556
• Published • 265
Paper
• 2505.09388
• Published • 339
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published • 211
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published • 11
Emergent Abilities of Large Language Models
Paper
• 2206.07682
• Published • 3
Muon is Scalable for LLM Training
Paper
• 2502.16982
• Published • 12