-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 135 -
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation
Paper • 2506.14028 • Published • 94 -
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities
Paper • 2507.19766 • Published • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2507.19766
-
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper • 2501.06186 • Published • 65 -
apple/OpenELM
Updated • 1.45k -
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Text Generation • 2B • Updated • 675k • • 1.49k -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447
-
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 21 -
Common 7B Language Models Already Possess Strong Math Capabilities
Paper • 2403.04706 • Published • 18 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 30 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 135 -
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation
Paper • 2506.14028 • Published • 94 -
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities
Paper • 2507.19766 • Published • 15
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper • 2501.06186 • Published • 65 -
apple/OpenELM
Updated • 1.45k -
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Text Generation • 2B • Updated • 675k • • 1.49k -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 30 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 21 -
Common 7B Language Models Already Possess Strong Math Capabilities
Paper • 2403.04706 • Published • 18 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4