Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 16 days ago • 42
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 16 days ago • 42
GPT Can Solve Mathematical Problems Without a Calculator Paper • 2309.03241 • Published Sep 6, 2023 • 19
LVBench: An Extreme Long Video Understanding Benchmark Paper • 2406.08035 • Published Jun 12, 2024 • 2
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 253