-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 8.12k • 70 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 133
Collections
Discover the best community collections!
Collections including paper arxiv:2505.17667
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 339 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 191
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 44 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 302 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 62 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 135 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 81 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 -
Tongyi-Zhiwen/QwenLong-L1-32B
Text Generation • 33B • Updated • 326 • • 167 -
mradermacher/QwenLong-L1-32B-GGUF
33B • Updated • 357 • 8 -
Tongyi-Zhiwen/QwenLong-L1-32B-AWQ
33B • Updated • 9 • 10
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 120 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146
-
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Paper • 2501.04686 • Published • 53 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 129 -
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Paper • 2502.19400 • Published • 47
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 8.12k • 70 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 133
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 135 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 81 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 339 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 191
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 -
Tongyi-Zhiwen/QwenLong-L1-32B
Text Generation • 33B • Updated • 326 • • 167 -
mradermacher/QwenLong-L1-32B-GGUF
33B • Updated • 357 • 8 -
Tongyi-Zhiwen/QwenLong-L1-32B-AWQ
33B • Updated • 9 • 10
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 76 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 44 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 120 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 302 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 62 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
-
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Paper • 2501.04686 • Published • 53 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 129 -
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Paper • 2502.19400 • Published • 47