Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.06464

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

my paper collection

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published Aug 28, 2025 • 11
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published Aug 21, 2025 • 3
Gated Delta Networks: Improving Mamba2 with Delta Rule

Paper • 2412.06464 • Published Dec 9, 2024 • 17

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Linear Attention

Higher-order Linear Attention

Paper • 2510.27258 • Published Oct 31, 2025 • 15
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 154
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Paper • 2503.13427 • Published Mar 17, 2025 • 3
MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19, 2025 • 36

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 32
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31, 2024 • 17
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 26
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 32

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

Linear Attention

Higher-order Linear Attention

Paper • 2510.27258 • Published Oct 31, 2025 • 15
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 154
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Paper • 2503.13427 • Published Mar 17, 2025 • 3
MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19, 2025 • 36

my paper collection

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published Aug 28, 2025 • 11
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published Aug 21, 2025 • 3
Gated Delta Networks: Improving Mamba2 with Delta Rule

Paper • 2412.06464 • Published Dec 9, 2024 • 17

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 32
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31, 2024 • 17
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 26
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 32

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs