Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.14901

The Path Not Taken: RLVR Provably Learns Off the Principals

Paper • 2511.08567 • Published Nov 11, 2025 • 35
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

Paper • 2510.04849 • Published Oct 6, 2025 • 117

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Paper • 2505.23359 • Published May 29, 2025 • 38
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3, 2025 • 36
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published Oct 19, 2025 • 22
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24, 2025 • 31

Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

Paper • 2506.15882 • Published Jun 18, 2025 • 2
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19, 2025 • 136
The Invisible Leash: Why RLVR May Not Escape Its Origin

Paper • 2507.14843 • Published Jul 20, 2025 • 85

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 114
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published Oct 23, 2025 • 12
A Definition of AGI

Paper • 2510.18212 • Published Oct 21, 2025 • 36
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published Oct 23, 2025 • 50

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality

Paper • 2510.22037 • Published Oct 24, 2025 • 22
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
Scaling Language-Centric Omnimodal Representation Learning

Paper • 2510.11693 • Published Oct 13, 2025 • 107

📝 [Paper List] Test-Time Learning for LLMs

Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published Oct 9, 2025 • 46
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Paper • 2510.15444 • Published Oct 17, 2025 • 151
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22, 2025 • 22

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published Oct 9, 2025 • 34
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5, 2025 • 26
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 28

MLLM Reasoning, Rewarding, and Understanding

Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 282
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2, 2025 • 52
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3, 2025 • 36
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146

The Path Not Taken: RLVR Provably Learns Off the Principals

Paper • 2511.08567 • Published Nov 11, 2025 • 35
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

Paper • 2510.04849 • Published Oct 6, 2025 • 117

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 114
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published Oct 23, 2025 • 12
A Definition of AGI

Paper • 2510.18212 • Published Oct 21, 2025 • 36
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published Oct 23, 2025 • 50

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality

Paper • 2510.22037 • Published Oct 24, 2025 • 22
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
Scaling Language-Centric Omnimodal Representation Learning

Paper • 2510.11693 • Published Oct 13, 2025 • 107

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Paper • 2505.23359 • Published May 29, 2025 • 38
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3, 2025 • 36
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146

📝 [Paper List] Test-Time Learning for LLMs

Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published Oct 9, 2025 • 46
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Paper • 2510.15444 • Published Oct 17, 2025 • 151
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22, 2025 • 22

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published Oct 19, 2025 • 22
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24, 2025 • 31

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published Oct 9, 2025 • 34
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5, 2025 • 26
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 28

Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

Paper • 2506.15882 • Published Jun 18, 2025 • 2
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19, 2025 • 136
The Invisible Leash: Why RLVR May Not Escape Its Origin

Paper • 2507.14843 • Published Jul 20, 2025 • 85

MLLM Reasoning, Rewarding, and Understanding

Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 282
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2, 2025 • 52
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3, 2025 • 36
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs