CocoaBench: Evaluating Unified Digital Agents in the Wild Paper • 2604.11201 • Published 3 days ago • 32
CocoaBench: Evaluating Unified Digital Agents in the Wild Paper • 2604.11201 • Published 3 days ago • 32
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction Paper • 2511.20937 • Published Nov 26, 2025 • 16
RLVE Collection Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317 • 3 items • Updated Nov 12, 2025 • 5
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29, 2025 • 19