-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2512.20856
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
Paper • 2510.18855 • Published • 73 -
INTELLECT-3: Technical Report
Paper • 2512.16144 • Published • 20
-
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper • 2601.21598 • Published • 10 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42 -
Self-Hinting Language Models Enhance Reinforcement Learning
Paper • 2602.03143 • Published • 31 -
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 61
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper • 2601.21598 • Published • 10 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42 -
Self-Hinting Language Models Enhance Reinforcement Learning
Paper • 2602.03143 • Published • 31 -
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 61
-
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper • 2512.02835 • Published • 10 -
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper • 2512.05044 • Published • 17 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17 -
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper • 2512.05343 • Published • 25
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
Paper • 2510.18855 • Published • 73 -
INTELLECT-3: Technical Report
Paper • 2512.16144 • Published • 20
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69