Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 4 days ago • 76
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Paper • 2412.07720 • Published Dec 10, 2024 • 31
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System Paper • 2410.08115 • Published Oct 10, 2024 • 8