HF Daily
updated
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published • 72
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL
Training
Paper
• 2509.03403
• Published • 23
LMEnt: A Suite for Analyzing Knowledge in Language Models from
Pretraining Data to Representations
Paper
• 2509.03405
• Published • 24
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement
Fine-Tuning of LLMs
Paper
• 2509.00930
• Published • 5
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper
• 2509.03867
• Published • 213
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published • 76
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
Real Instructions?
Paper
• 2509.04292
• Published • 58
Delta Activations: A Representation for Finetuned Large Language Models
Paper
• 2509.04442
• Published • 7
Why Language Models Hallucinate
Paper
• 2509.04664
• Published • 199
Set Block Decoding is a Language Model Inference Accelerator
Paper
• 2509.04185
• Published • 54
Bootstrapping Task Spaces for Self-Improvement
Paper
• 2509.04575
• Published • 6
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
Paper
• 2509.04013
• Published • 4
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published • 151
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
• 2509.06949
• Published • 57
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published • 32
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM
Step-Provers
Paper
• 2509.06493
• Published • 12
SFR-DeepResearch: Towards Effective Reinforcement Learning for
Autonomously Reasoning Single Agents
Paper
• 2509.06283
• Published • 17
Test-Time Scaling in Reasoning Models Is Not Effective for
Knowledge-Intensive Tasks Yet
Paper
• 2509.06861
• Published • 9
R^textbf{2AI}: Towards Resistant and Resilient AI in an
Evolving World
Paper
• 2509.06786
• Published • 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published • 105
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published • 664
Staying in the Sweet Spot: Responsive Reasoning Evolution via
Capability-Adaptive Hint Scaffolding
Paper
• 2509.06923
• Published • 22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper
• 2509.03646
• Published • 33
ΔL Normalization: Rethink Loss Aggregation in RLVR
Paper
• 2509.07558
• Published • 7
From Noise to Narrative: Tracing the Origins of Hallucinations in
Transformers
Paper
• 2509.06938
• Published • 5
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning
in Large Language Models
Paper
• 2509.09675
• Published • 28
The Majority is not always right: RL training for solution aggregation
Paper
• 2509.06870
• Published • 15
Statistical Methods in Generative AI
Paper
• 2509.07054
• Published • 11
MachineLearningLM: Continued Pretraining Language Models on Millions of
Synthetic Tabular Prediction Tasks Scales In-Context ML
Paper
• 2509.06806
• Published • 63
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in
LLMs
Paper
• 2509.09677
• Published • 37
Paper
• 2509.10147
• Published • 27
Single-stream Policy Optimization
Paper
• 2509.13232
• Published • 36
EconProver: Towards More Economical Test-Time Scaling for Automated
Theorem Proving
Paper
• 2509.12603
• Published • 9
Towards General Agentic Intelligence via Environment Scaling
Paper
• 2509.13311
• Published • 72
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via
Machine Unlearning
Paper
• 2509.13755
• Published • 19
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical
Reasoning
Paper
• 2509.13761
• Published • 16
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published • 118
Reasoning over Boundaries: Enhancing Specification Alignment via
Test-time Delibration
Paper
• 2509.14760
• Published • 53
Evolving Language Models without Labels: Majority Drives Selection,
Novelty Promotes Variation
Paper
• 2509.15194
• Published • 33
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published • 45
LIMI: Less is More for Agency
Paper
• 2509.17567
• Published • 104
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric
Reasoning
Paper
• 2509.17437
• Published • 17
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Paper
• 2509.16117
• Published • 23
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from
Token and Parameter Levels
Paper
• 2509.16596
• Published • 14
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
Paper
• 2509.18083
• Published • 5
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with
LLMs
Paper
• 2509.17998
• Published • 1
Reinforcement Learning on Pre-Training Data
Paper
• 2509.19249
• Published • 67
MAPO: Mixed Advantage Policy Optimization
Paper
• 2509.18849
• Published • 27
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
• 2509.19284
• Published • 23
SIM-CoT: Supervised Implicit Chain-of-Thought
Paper
• 2509.20317
• Published • 42
EmbeddingGemma: Powerful and Lightweight Text Representations
Paper
• 2509.20354
• Published • 48
Video models are zero-shot learners and reasoners
Paper
• 2509.20328
• Published • 100
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just
What They Say
Paper
• 2509.21164
• Published • 9
VCRL: Variance-based Curriculum Reinforcement Learning for Large
Language Models
Paper
• 2509.19803
• Published • 122
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
Paper
• 2509.21320
• Published • 101
Tree Search for LLM Agent Reinforcement Learning
Paper
• 2509.21240
• Published • 92
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy
Optimization in Reinforcement Learning
Paper
• 2509.20712
• Published • 19
Thinking Augmented Pre-training
Paper
• 2509.20186
• Published • 24
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical
Reasoning
Paper
• 2509.21070
• Published • 9
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published • 137
Quantile Advantage Estimation for Entropy-Safe Reasoning
Paper
• 2509.22611
• Published • 120
Variational Reasoning for Language Models
Paper
• 2509.22637
• Published • 69
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper
• 2509.22638
• Published • 70
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
• 2509.21880
• Published • 53
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model
Reasoning
Paper
• 2509.19894
• Published • 34
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion
Models
Paper
• 2509.22300
• Published • 3
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable
Sparse-Linear Attention
Paper
• 2509.24006
• Published • 119
Multiplayer Nash Preference Optimization
Paper
• 2509.23102
• Published • 62
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach
for LLM Reasoning in RLVR
Paper
• 2509.23808
• Published • 47
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published • 47
When Does Reasoning Matter? A Controlled Study of Reasoning's
Contribution to Model Performance
Paper
• 2509.22193
• Published • 38
SparseD: Sparse Attention for Diffusion Language Models
Paper
• 2509.24014
• Published • 31
Random Policy Valuation is Enough for LLM Reasoning with Verifiable
Rewards
Paper
• 2509.24981
• Published • 29
The Era of Real-World Human Interaction: RL from User Conversations
Paper
• 2509.25137
• Published • 19
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference
Learning
Paper
• 2509.23285
• Published • 14
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient
Chain-of-Thought Training
Paper
• 2509.24494
• Published • 11
The Dragon Hatchling: The Missing Link between the Transformer and
Models of the Brain
Paper
• 2509.26507
• Published • 550
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper
• 2509.25760
• Published • 55
Thinking-Free Policy Initialization Makes Distilled Reasoning Models
More Effective and Efficient Reasoners
Paper
• 2509.26226
• Published • 34
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During
Post Training
Paper
• 2509.25758
• Published • 23
Mem-α: Learning Memory Construction via Reinforcement Learning
Paper
• 2509.25911
• Published • 15
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
• 2509.26628
• Published • 17
InfoAgent: Advancing Autonomous Information-Seeking Agents
Paper
• 2509.25189
• Published • 14
Benefits and Pitfalls of Reinforcement Learning for Language Model
Planning: A Theoretical Perspective
Paper
• 2509.22613
• Published • 10
Specialization after Generalization: Towards Understanding Test-Time
Training in Foundation Models
Paper
• 2509.24510
• Published • 5
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published • 148
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published • 91
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
• 2509.25849
• Published • 48
It Takes Two: Your GRPO Is Secretly DPO
Paper
• 2510.00977
• Published • 32
ACON: Optimizing Context Compression for Long-horizon LLM Agents
Paper
• 2510.00615
• Published • 35
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper
• 2510.01180
• Published • 20
Making, not Taking, the Best of N
Paper
• 2510.00931
• Published • 11
CurES: From Gradient Analysis to Efficient Curriculum Learning for
Reasoning LLMs
Paper
• 2510.01037
• Published • 2
LongCodeZip: Compress Long Context for Code Language Models
Paper
• 2510.00446
• Published • 108
ExGRPO: Learning to Reason from Experience
Paper
• 2510.02245
• Published • 83
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
• 2510.02297
• Published • 43
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published • 45
Aristotle: IMO-level Automated Theorem Proving
Paper
• 2510.01346
• Published • 17
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning
Problems
Paper
• 2510.02263
• Published • 9
Paper
• 2510.01141
• Published • 123
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published • 60
Self-Improvement in Multimodal Large Language Models: A Survey
Paper
• 2510.02665
• Published • 21
Continuously Augmented Discrete Diffusion model for Categorical
Generative Modeling
Paper
• 2510.01329
• Published • 6
Pretraining with hierarchical memories: separating long-tail and common
knowledge
Paper
• 2510.02375
• Published • 6
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
Paper
• 2510.01132
• Published • 6
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published • 131
Paper2Video: Automatic Video Generation from Scientific Papers
Paper
• 2510.05096
• Published • 120
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
• 2510.03632
• Published • 42
Hybrid Architectures for Language Models: Systematic Analysis and Design
Insights
Paper
• 2510.04800
• Published • 37
Front-Loading Reasoning: The Synergy between Pretraining and
Post-Training Data
Paper
• 2510.03264
• Published • 25
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 513
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published • 110
MixReasoning: Switching Modes to Think
Paper
• 2510.06052
• Published • 23
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model
Reasoning
Paper
• 2510.04081
• Published • 23
Cache-to-Cache: Direct Semantic Communication Between Large Language
Models
Paper
• 2510.03215
• Published • 99
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
Generation and Understanding
Paper
• 2510.06308
• Published • 55
Ming-UniVision: Joint Image Understanding and Generation with a Unified
Continuous Tokenizer
Paper
• 2510.06590
• Published • 77
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published • 31
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 276
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published • 57
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper
• 2510.07499
• Published • 49
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published • 76
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published • 182
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published • 170
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Paper
• 2510.10201
• Published • 36
Demystifying Reinforcement Learning in Agentic Reasoning
Paper
• 2510.11701
• Published • 33
Don't Just Fine-tune the Agent, Tune the Environment
Paper
• 2510.10197
• Published • 30
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic
Tasks
Paper
• 2510.12635
• Published • 17
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm
Enables Fine-Grained Policy Optimization
Paper
• 2510.13554
• Published • 58
Stronger Together: On-Policy Reinforcement Learning for Collaborative
LLMs
Paper
• 2510.11062
• Published • 29
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate
Reasoning
Paper
• 2510.10494
• Published • 2
Agentic Entropy-Balanced Policy Optimization
Paper
• 2510.14545
• Published • 108
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Paper
• 2510.14943
• Published • 40
Information Gain-based Policy Optimization: A Simple and Effective
Approach for Multi-Turn LLM Agents
Paper
• 2510.14967
• Published • 34
LLMs Can Get "Brain Rot"!
Paper
• 2510.13928
• Published • 23
LLM-guided Hierarchical Retrieval
Paper
• 2510.13217
• Published • 21
Large Language Models Do NOT Really Know What They Don't Know
Paper
• 2510.09033
• Published • 17