EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14, 2025 • 37
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published Aug 7, 2025 • 20
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 21
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published Aug 7, 2025 • 65