benchmark - a changliu816 Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

changliu816 's Collections

VisualGeneration

ComputerUseAgent

benchmark

updated 3 days ago

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Paper • 2604.27419 • Published 7 days ago • 13

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs