-
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper • 2512.13586 • Published • 93 -
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper • 2601.06431 • Published • 12 -
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper • 2601.09088 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2312.12456
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 62 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 20 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 57 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 61 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 45 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 13 -
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper • 2312.12682 • Published • 9
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61 -
Training Transformers with 4-bit Integers
Paper • 2306.11987 • Published • 23 -
FasterViT: Fast Vision Transformers with Hierarchical Attention
Paper • 2306.06189 • Published • 32 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 22
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 6 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 33
-
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper • 2512.13586 • Published • 93 -
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper • 2601.06431 • Published • 12 -
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper • 2601.09088 • Published • 63
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61 -
Training Transformers with 4-bit Integers
Paper • 2306.11987 • Published • 23 -
FasterViT: Fast Vision Transformers with Hierarchical Attention
Paper • 2306.06189 • Published • 32 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 22
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 62 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 20 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 57 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 6 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 33
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 61 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 45 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 13 -
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper • 2312.12682 • Published • 9