-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 190 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97
Collections
Discover the best community collections!
Collections including paper arxiv:2505.10554
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Paper • 2505.09694 • Published • 20 -
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.5k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 96 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Paper • 2503.21614 • Published • 43 -
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
JudgeLRM: Large Reasoning Models as a Judge
Paper • 2504.00050 • Published • 62 -
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper • 2504.05599 • Published • 86
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 60 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Chain-of-Model Learning for Language Model
Paper • 2505.11820 • Published • 121 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 83 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 44 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 110 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 30 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 190 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Chain-of-Model Learning for Language Model
Paper • 2505.11820 • Published • 121 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 83 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Paper • 2505.09694 • Published • 20 -
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.5k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 96 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 44 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Paper • 2503.21614 • Published • 43 -
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
JudgeLRM: Large Reasoning Models as a Judge
Paper • 2504.00050 • Published • 62 -
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper • 2504.05599 • Published • 86
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 110 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 60 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 30 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12