MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs Paper • 2602.12705 • Published Feb 13 • 68
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning Paper • 2512.24265 • Published Dec 30, 2025 • 4
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published Oct 3, 2025 • 99
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 137
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11, 2025 • 47