Collections
Discover the best community collections!
Collections including paper arxiv:2205.13147
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 85 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 11 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 85 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 11 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45