LLM_architectures
updated
Nemotron-4 15B Technical Report
Paper
• 2402.16819
• Published • 46
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published • 57
RWKV: Reinventing RNNs for the Transformer Era
Paper
• 2305.13048
• Published • 21
Reformer: The Efficient Transformer
Paper
• 2001.04451
• Published • 2
Attention Is All You Need
Paper
• 1706.03762
• Published • 120
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published • 26
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published • 18
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Paper
• 2112.06905
• Published • 2
UL2: Unifying Language Learning Paradigms
Paper
• 2205.05131
• Published • 5
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
• 2211.05100
• Published • 37
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published • 10
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 251
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published • 150
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 154
Paper
• 2310.06825
• Published • 58
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 49
Paper
• 2401.04088
• Published • 160
The Falcon Series of Open Language Models
Paper
• 2311.16867
• Published • 15
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published • 50
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published • 112
ReALM: Reference Resolution As Language Modeling
Paper
• 2403.20329
• Published • 22
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published • 40
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published • 48
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published • 111
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published • 259
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper
• 2405.05254
• Published • 10
TransformerFAM: Feedback attention is working memory
Paper
• 2404.09173
• Published • 43
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
Comprehensive Study to Low Rank Compensation
Paper
• 2303.08302
• Published
Kolmogorov-Arnold Transformer
Paper
• 2409.10594
• Published • 45
Fast Inference from Transformers via Speculative Decoding
Paper
• 2211.17192
• Published • 11
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published • 58
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published • 154