-
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper • 2512.13586 • Published • 93 -
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper • 2601.06431 • Published • 12 -
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper • 2601.09088 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2602.06717
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 18 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
-
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Paper • 2602.19895 • Published • 14 -
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning
Paper • 2602.01062 • Published -
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
Paper • 2602.06717 • Published • 74 -
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning
Paper • 2601.22582 • Published
-
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper • 2512.13586 • Published • 93 -
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper • 2601.06431 • Published • 12 -
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper • 2601.09088 • Published • 63
-
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Paper • 2602.19895 • Published • 14 -
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning
Paper • 2602.01062 • Published -
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
Paper • 2602.06717 • Published • 74 -
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning
Paper • 2601.22582 • Published
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 18 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42