-
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 35 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Paper • 2510.04849 • Published • 117
Collections
Discover the best community collections!
Collections including paper arxiv:2510.14901
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 38 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
-
RL makes MLLMs see better than SFT
Paper • 2510.16333 • Published • 49 -
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Paper • 2510.16888 • Published • 22 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 31
-
Magistral
Paper • 2506.10910 • Published • 68 -
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Paper • 2506.15882 • Published • 2 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 136 -
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper • 2507.14843 • Published • 85
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 114 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 12 -
A Definition of AGI
Paper • 2510.18212 • Published • 36 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 50
-
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Paper • 2510.22037 • Published • 22 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Scaling Language-Centric Omnimodal Representation Learning
Paper • 2510.11693 • Published • 107
-
Training-Free Group Relative Policy Optimization
Paper • 2510.08191 • Published • 46 -
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Paper • 2510.15444 • Published • 151 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 22
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 36 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 34 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 26 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 28
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
-
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 35 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Paper • 2510.04849 • Published • 117
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 114 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 12 -
A Definition of AGI
Paper • 2510.18212 • Published • 36 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 50
-
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Paper • 2510.22037 • Published • 22 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Scaling Language-Centric Omnimodal Representation Learning
Paper • 2510.11693 • Published • 107
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 38 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
-
Training-Free Group Relative Policy Optimization
Paper • 2510.08191 • Published • 46 -
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Paper • 2510.15444 • Published • 151 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 22
-
RL makes MLLMs see better than SFT
Paper • 2510.16333 • Published • 49 -
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Paper • 2510.16888 • Published • 22 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 31
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 36 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 34 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 26 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 28
-
Magistral
Paper • 2506.10910 • Published • 68 -
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Paper • 2506.15882 • Published • 2 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 136 -
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper • 2507.14843 • Published • 85
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146