LLM Papers
updated
Attention Is All You Need
Paper
• 1706.03762
• Published • 120
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published • 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published • 22
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published • 20
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published • 15
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published • 24
PaLM: Scaling Language Modeling with Pathways
Paper
• 2204.02311
• Published • 3
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published • 10
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published • 23
Paper
• 2303.08774
• Published • 7
Paper
• 2305.10403
• Published • 8
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper
• 2305.10601
• Published • 15
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 251
Attention Is Not All You Need Anymore
Paper
• 2308.07661
• Published • 1
Paper
• 2310.06825
• Published • 58
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 49
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published • 64
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published • 50
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published • 126
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Paper
• 2407.01370
• Published • 89
OpenDevin: An Open Platform for AI Software Developers as Generalist
Agents
Paper
• 2407.16741
• Published • 77
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published • 118
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
• 2408.06292
• Published • 128
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published • 154
Paper
• 2410.21276
• Published • 87
DeepSeek-V3 Technical Report
Paper
• 2412.19437
• Published • 82
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published • 115
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 258
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 447
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published • 195