Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.28139

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

Paper • 2506.18403 • Published Jun 23, 2025 • 3
ReCode: Updating Code API Knowledge with Reinforcement Learning

Paper • 2506.20495 • Published Jun 25, 2025 • 10
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Paper • 2507.23348 • Published Jul 31, 2025 • 12
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Paper • 2509.09614 • Published Sep 11, 2025 • 7

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Paper • 2604.27419 • Published 7 days ago • 13

Agent benchmarks

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 22
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1, 2024 • 22

about 22 hours ago

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100

benchmarks • continual learning

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

Paper • 2506.18403 • Published Jun 23, 2025 • 3
ReCode: Updating Code API Knowledge with Reinforcement Learning

Paper • 2506.20495 • Published Jun 25, 2025 • 10
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Paper • 2507.23348 • Published Jul 31, 2025 • 12
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Paper • 2509.09614 • Published Sep 11, 2025 • 7

about 22 hours ago

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Paper • 2604.27419 • Published 7 days ago • 13

benchmarks • continual learning

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38

Agent benchmarks

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Paper • 2604.28139 • Published 7 days ago • 38

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 22
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1, 2024 • 22

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs