Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
Paper • 2604.08362 • Published • 14
None defined yet.
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation