-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7 -
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus
Paper • 2603.20105 • Published • 37
Collections
Discover the best community collections!
Collections including paper arxiv:2410.20672
-
Learning Video Representations without Natural Videos
Paper • 2410.24213 • Published • 16 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25
-
LFM2 Technical Report
Paper • 2511.23404 • Published • 56 -
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Paper • 2510.07318 • Published • 32 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7 -
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus
Paper • 2603.20105 • Published • 37
-
LFM2 Technical Report
Paper • 2511.23404 • Published • 56 -
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Paper • 2510.07318 • Published • 32 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
-
Learning Video Representations without Natural Videos
Paper • 2410.24213 • Published • 16 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25