Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment Paper • 2605.20834 • Published 1 day ago • 3
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 9 days ago • 259
gradients-io-tournaments/tournament-tourn_f4f456bc6d050b8b_20260430-04b98654-a18a-49c0-b291-2c623c1cfbc1-5C7vE26G 2B • Updated 20 days ago • 33 • 1
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published Apr 6 • 36
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published Mar 20 • 36