ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published Jun 23, 2025 • 7
UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models Paper • 2508.07766 • Published Aug 11, 2025 • 1
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective Paper • 2510.10150 • Published Oct 11, 2025 • 1