-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 10 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2303.08774
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 154 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 217
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Paper • 2310.12321 • Published • 1
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 22 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20
-
Large Language Model Alignment: A Survey
Paper • 2309.15025 • Published • 2 -
Aligning Large Language Models with Human: A Survey
Paper • 2307.12966 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Paper • 2310.05344 • Published • 1
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 10 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 154 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 217
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 22 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20
-
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Paper • 2310.12321 • Published • 1
-
Large Language Model Alignment: A Survey
Paper • 2309.15025 • Published • 2 -
Aligning Large Language Models with Human: A Survey
Paper • 2307.12966 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Paper • 2310.05344 • Published • 1