Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
changliu816 's Collections
benchmark
VisualGeneration
ComputerUseAgent
Agentic
VLM
reasoning
papers

benchmark

updated 3 days ago
Upvote
-

  • Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

    Paper • 2604.28139 • Published 7 days ago • 38

  • InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

    Paper • 2604.27419 • Published 7 days ago • 13
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs