Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 43
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 20
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 15 days ago • 22