ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 3 days ago • 125
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models Paper • 2604.09459 • Published 3 days ago • 11
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 52
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 265
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 304
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9, 2025 • 53
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding Paper • 2510.11498 • Published Oct 13, 2025 • 11
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12, 2025 • 47
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24, 2025 • 80
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published Aug 14, 2025 • 20
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6, 2025 • 129
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators Paper • 2508.09101 • Published Aug 12, 2025 • 8
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8, 2025 • 45
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Paper • 2507.04952 • Published Jul 7, 2025 • 11
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models Paper • 2503.18923 • Published Mar 24, 2025 • 14
A Comprehensive Survey on Long Context Language Modeling Paper • 2503.17407 • Published Mar 20, 2025 • 49