T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search Paper • 2603.22341 • Published 24 days ago • 37
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 150
IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL Paper • 2603.12151 • Published Mar 12 • 2
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents Paper • 2603.09827 • Published Mar 10 • 30
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models Paper • 2602.17602 • Published Feb 19 • 56
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30, 2025 • 146
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 126
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published Feb 3 • 31
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published Feb 4 • 37
Running on Zero 167 Music Flamingo 🎵 167 Analyze music and answer questions from audio or YouTube links