DailyPapers - a elonming Collection

elonming 's Collections

DailyPapers

updated 8 days ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 210
SII-Enigma/Llama3.2-8B-Ins-AMPO

Text Generation • 8B • Updated 27 days ago • 102
Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 19
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 33
Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Paper • 2403.05612 • Published Mar 8, 2024 • 3
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Paper • 2505.11711 • Published May 16, 2025 • 11
Reinforcement Learning via Self-Distillation

Paper • 2601.20802 • Published Jan 28 • 43
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 238
AI Can Learn Scientific Taste

Paper • 2603.14473 • Published Mar 15 • 423
π_{0.5}: a Vision-Language-Action Model with Open-World Generalization

Paper • 2504.16054 • Published Apr 22, 2025 • 4
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

Paper • 2603.22446 • Published 24 days ago • 10
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published 28 days ago • 337
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 25 days ago • 29