-
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 30 -
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens
Paper • 2508.05305 • Published • 48 -
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper • 2511.04217 • Published • 17 -
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 33
Collections
Discover the best community collections!
Collections including paper arxiv:2410.02724
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
Learning To Teach Large Language Models Logical Reasoning
Paper • 2310.09158 • Published • 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper • 2308.09583 • Published • 7
-
Intelligence at the Edge of Chaos
Paper • 2410.02536 • Published • 6 -
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 33 -
Learning the Latent Rules of a Game from Data: A Chess Story
Paper • 2410.02426 • Published • 4 -
Quantifying Generalization Complexity for Large Language Models
Paper • 2410.01769 • Published • 13
-
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 33 -
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Paper • 2410.02757 • Published • 36 -
LLaVA-Critic: Learning to Evaluate Multimodal Models
Paper • 2410.02712 • Published • 37 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 50
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 43 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 23 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 13
-
Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning
Paper • 2402.17457 • Published -
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Paper • 2402.04553 • Published -
TextGrad: Automatic "Differentiation" via Text
Paper • 2406.07496 • Published • 31 -
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Paper • 2405.14578 • Published • 1
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 22 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 38 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22
-
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 30 -
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens
Paper • 2508.05305 • Published • 48 -
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper • 2511.04217 • Published • 17 -
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 33
-
Intelligence at the Edge of Chaos
Paper • 2410.02536 • Published • 6 -
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 33 -
Learning the Latent Rules of a Game from Data: A Chess Story
Paper • 2410.02426 • Published • 4 -
Quantifying Generalization Complexity for Large Language Models
Paper • 2410.01769 • Published • 13
-
Large Language Models as Markov Chains
Paper • 2410.02724 • Published • 33 -
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Paper • 2410.02757 • Published • 36 -
LLaVA-Critic: Learning to Evaluate Multimodal Models
Paper • 2410.02712 • Published • 37 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 50
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 43 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 23 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 13
-
Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning
Paper • 2402.17457 • Published -
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Paper • 2402.04553 • Published -
TextGrad: Automatic "Differentiation" via Text
Paper • 2406.07496 • Published • 31 -
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Paper • 2405.14578 • Published • 1
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
Learning To Teach Large Language Models Logical Reasoning
Paper • 2310.09158 • Published • 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper • 2308.09583 • Published • 7
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 22 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 38 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22