🚀 Spinning Up in LLMs
updated
Lost in the Middle: How Language Models Use Long Contexts
Paper
• 2307.03172
• Published • 44
Efficient Estimation of Word Representations in Vector Space
Paper
• 1301.3781
• Published • 8
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published • 26
Attention Is All You Need
Paper
• 1706.03762
• Published • 121
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published • 20
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 251
Emergent Abilities of Large Language Models
Paper
• 2206.07682
• Published • 3
Scaling Laws for Neural Language Models
Paper
• 2001.08361
• Published • 10
Are Emergent Abilities of Large Language Models a Mirage?
Paper
• 2304.15004
• Published • 8
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published • 15
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper
• 2306.05685
• Published • 42
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published • 11
Neural Machine Translation of Rare Words with Subword Units
Paper
• 1508.07909
• Published • 4
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published • 112
Paper
• 2401.04088
• Published • 160
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published • 108
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 155
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper
• 2211.08411
• Published • 3
Large Language Models are Zero-Shot Reasoners
Paper
• 2205.11916
• Published • 3