Finished Reading
updated
Self-Play Preference Optimization for Language Model Alignment
Paper
• 2405.00675
• Published • 28
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
• 2205.14135
• Published • 15
Attention Is All You Need
Paper
• 1706.03762
• Published • 120
FlashAttention-2: Faster Attention with Better Parallelism and Work
Partitioning
Paper
• 2307.08691
• Published • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
Low-precision
Paper
• 2407.08608
• Published • 1
Efficient Transformers: A Survey
Paper
• 2009.06732
• Published • 1
Linformer: Self-Attention with Linear Complexity
Paper
• 2006.04768
• Published • 2
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published • 82
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
• 2104.09864
• Published • 17
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published • 107
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 628
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 251
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
• 2305.13245
• Published • 6
Accuracy is Not All You Need
Paper
• 2407.09141
• Published • 3