HF Daily - a Filange Collection

Filange 's Collections

HF Daily

updated Oct 29, 2025

Upvote

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30, 2025 • 72
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3, 2025 • 23
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3, 2025 • 24
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31, 2025 • 5
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 213
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4, 2025 • 76
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published Sep 4, 2025 • 58
Delta Activations: A Representation for Finetuned Large Language Models

Paper • 2509.04442 • Published Sep 4, 2025 • 7
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 199
Set Block Decoding is a Language Model Inference Accelerator

Paper • 2509.04185 • Published Sep 4, 2025 • 54
Bootstrapping Task Spaces for Self-Improvement

Paper • 2509.04575 • Published Sep 4, 2025 • 6
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs

Paper • 2509.04013 • Published Sep 4, 2025 • 4
Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 151
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8, 2025 • 57
Reinforcement Learning Foundations for Deep Research Systems: A Survey

Paper • 2509.06733 • Published Sep 8, 2025 • 32
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Paper • 2509.06493 • Published Sep 8, 2025 • 12
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

Paper • 2509.06283 • Published Sep 8, 2025 • 17
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet

Paper • 2509.06861 • Published Sep 8, 2025 • 9
R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

Paper • 2509.06786 • Published Sep 8, 2025 • 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 664
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Paper • 2509.06923 • Published Sep 8, 2025 • 22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Paper • 2509.03646 • Published Sep 3, 2025 • 33
ΔL Normalization: Rethink Loss Aggregation in RLVR

Paper • 2509.07558 • Published Sep 9, 2025 • 7
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers

Paper • 2509.06938 • Published Sep 8, 2025 • 5
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Paper • 2509.09675 • Published Sep 11, 2025 • 28
The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8, 2025 • 15
Statistical Methods in Generative AI

Paper • 2509.07054 • Published Sep 8, 2025 • 11
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Paper • 2509.06806 • Published Sep 8, 2025 • 63
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11, 2025 • 37
Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12, 2025 • 27
Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16, 2025 • 36
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Paper • 2509.12603 • Published Sep 16, 2025 • 9
Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 72
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Paper • 2509.13755 • Published Sep 17, 2025 • 19
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Paper • 2509.13761 • Published Sep 17, 2025 • 16
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 118
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Paper • 2509.14760 • Published Sep 18, 2025 • 53
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Paper • 2509.15194 • Published Sep 18, 2025 • 33
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19, 2025 • 45
LIMI: Less is More for Agency

Paper • 2509.17567 • Published Sep 22, 2025 • 104
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

Paper • 2509.17437 • Published Sep 22, 2025 • 17
DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Paper • 2509.16117 • Published Sep 19, 2025 • 23
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

Paper • 2509.16596 • Published Sep 20, 2025 • 14
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

Paper • 2509.18083 • Published Sep 22, 2025 • 5
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

Paper • 2509.17998 • Published Sep 22, 2025 • 1
Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published Sep 23, 2025 • 67
MAPO: Mixed Advantage Policy Optimization

Paper • 2509.18849 • Published Sep 23, 2025 • 27
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23, 2025 • 23
SIM-CoT: Supervised Implicit Chain-of-Thought

Paper • 2509.20317 • Published Sep 24, 2025 • 42
EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24, 2025 • 48
Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 100
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say

Paper • 2509.21164 • Published Sep 25, 2025 • 9
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24, 2025 • 122
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

Paper • 2509.21320 • Published Sep 25, 2025 • 101
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Paper • 2509.20712 • Published Sep 25, 2025 • 19
Thinking Augmented Pre-training

Paper • 2509.20186 • Published Sep 24, 2025 • 24
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Paper • 2509.21070 • Published Sep 25, 2025 • 9
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137
Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published Sep 26, 2025 • 120
Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69
Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published Sep 26, 2025 • 53
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

Paper • 2509.19894 • Published Sep 24, 2025 • 34
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models

Paper • 2509.22300 • Published Sep 26, 2025 • 3
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published Sep 28, 2025 • 119
Multiplayer Nash Preference Optimization

Paper • 2509.23102 • Published Sep 27, 2025 • 62
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Paper • 2509.23808 • Published Sep 28, 2025 • 47
Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 47
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Paper • 2509.22193 • Published Sep 26, 2025 • 38
SparseD: Sparse Attention for Diffusion Language Models

Paper • 2509.24014 • Published Sep 28, 2025 • 31
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

Paper • 2509.24981 • Published Sep 29, 2025 • 29
The Era of Real-World Human Interaction: RL from User Conversations

Paper • 2509.25137 • Published Sep 29, 2025 • 19
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning

Paper • 2509.23285 • Published Sep 27, 2025 • 14
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training

Paper • 2509.24494 • Published Sep 29, 2025 • 11
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

Paper • 2509.26226 • Published Sep 30, 2025 • 34
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

Paper • 2509.25758 • Published Sep 30, 2025 • 23
Mem-α: Learning Memory Construction via Reinforcement Learning

Paper • 2509.25911 • Published Sep 30, 2025 • 15
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17
InfoAgent: Advancing Autonomous Information-Seeking Agents

Paper • 2509.25189 • Published Sep 29, 2025 • 14
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

Paper • 2509.22613 • Published Sep 26, 2025 • 10
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

Paper • 2509.24510 • Published Sep 29, 2025 • 5
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 148
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 91
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30, 2025 • 48
It Takes Two: Your GRPO Is Secretly DPO

Paper • 2510.00977 • Published Oct 1, 2025 • 32
ACON: Optimizing Context Compression for Long-horizon LLM Agents

Paper • 2510.00615 • Published Oct 1, 2025 • 35
BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1, 2025 • 20
Making, not Taking, the Best of N

Paper • 2510.00931 • Published Oct 1, 2025 • 11
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Paper • 2510.01037 • Published Oct 1, 2025 • 2
LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1, 2025 • 108
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 83
Interactive Training: Feedback-Driven Neural Network Optimization

Paper • 2510.02297 • Published Oct 2, 2025 • 43
RLP: Reinforcement as a Pretraining Objective

Paper • 2510.01265 • Published Sep 26, 2025 • 45
Aristotle: IMO-level Automated Theorem Proving

Paper • 2510.01346 • Published Oct 1, 2025 • 17
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Paper • 2510.02263 • Published Oct 2, 2025 • 9
Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1, 2025 • 123
Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1, 2025 • 60
Self-Improvement in Multimodal Large Language Models: A Survey

Paper • 2510.02665 • Published Oct 3, 2025 • 21
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

Paper • 2510.01329 • Published Oct 1, 2025 • 6
Pretraining with hierarchical memories: separating long-tail and common knowledge

Paper • 2510.02375 • Published Sep 29, 2025 • 6
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Paper • 2510.01132 • Published Oct 1, 2025 • 6
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 131
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Paper • 2510.03632 • Published Oct 4, 2025 • 42
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

Paper • 2510.04800 • Published Oct 6, 2025 • 37
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

Paper • 2510.03264 • Published Sep 26, 2025 • 25
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 110
MixReasoning: Switching Modes to Think

Paper • 2510.06052 • Published Oct 7, 2025 • 23
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Paper • 2510.04081 • Published Oct 5, 2025 • 23
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3, 2025 • 99
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7, 2025 • 55
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Paper • 2510.06590 • Published Oct 8, 2025 • 77
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 276
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8, 2025 • 49
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 170
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Paper • 2510.10201 • Published Oct 11, 2025 • 36
Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 33
Don't Just Fine-tune the Agent, Tune the Environment

Paper • 2510.10197 • Published Oct 11, 2025 • 30
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

Paper • 2510.12635 • Published Oct 14, 2025 • 17
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

Paper • 2510.13554 • Published Oct 15, 2025 • 58
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs

Paper • 2510.11062 • Published Oct 13, 2025 • 29
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

Paper • 2510.10494 • Published Oct 12, 2025 • 2
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Paper • 2510.14943 • Published Oct 16, 2025 • 40
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

Paper • 2510.14967 • Published Oct 16, 2025 • 34
LLMs Can Get "Brain Rot"!

Paper • 2510.13928 • Published Oct 15, 2025 • 23
LLM-guided Hierarchical Retrieval

Paper • 2510.13217 • Published Oct 15, 2025 • 21
Large Language Models Do NOT Really Know What They Don't Know

Paper • 2510.09033 • Published Oct 10, 2025 • 17

Upvote

Collection guide
Browse collections