Finished Reading - a jmagder Collection

jmagder 's Collections

Finished Reading

Finished Reading

updated Jul 11, 2025

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 28
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
Efficient Transformers: A Survey

Paper • 2009.06732 • Published Sep 14, 2020 • 1
Linformer: Self-Attention with Linear Complexity

Paper • 2006.04768 • Published Jun 8, 2020 • 2
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 82
RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 17
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 628
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Accuracy is Not All You Need

Paper • 2407.09141 • Published Jul 12, 2024 • 3