유영준's picture

유영준

colin31472

AI & ML interests

None yet

Organizations

None yet

upvoted a paper 6 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513

upvoted 3 papers 7 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 190

Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17, 2025 • 30

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193

upvoted a paper 8 months ago

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Paper • 2410.08146 • Published Oct 10, 2024 • 1

upvoted 6 papers 9 months ago

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Paper • 2410.01679 • Published Oct 2, 2024 • 27

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146

AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19, 2025 • 83

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28, 2025 • 132

First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9, 2025 • 24

upvoted 4 papers 10 months ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 274

Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.09250 • Published Jun 10, 2025 • 27

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134

upvoted 2 papers 11 months ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 154

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146