CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
• Updated • 1.54k • 539
• 18
Updated • 89
• 6
rootsautomation/RICO-ScreenQA
Viewer
• Updated • 86k • 128
• 11
rootsautomation/ScreenSpot
Viewer
• Updated • 1.27k • 1.95k
• 46
Viewer
• Updated • 1.27k • 1.06k
• 8
Benchmark
• Updated • 9.11k
• 60
Preview
• Updated • 1.17k
• 15
Preview
• Updated • 1.87k
• 25
Viewer
• Updated • 168k • 669
• 5
Preview
• Updated • 6
osunlp/Multimodal-Mind2Web
Viewer
• Updated • 14.2k • 6.31k
• 91
Viewer
• Updated • 259 • 33
• 2
Viewer
• Updated • 253 • 4.99k
• 124
Viewer
• Updated • 7.74k • 7.58k
• 26
xlangai/ubuntu_osworld_file_cache
Updated • 757k
• 9
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published • 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
• 2405.14573
• Published
Viewer
• Updated • 1.21k • 108
• 5