Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization Paper • 2604.12290 • Published 5 days ago • 16
ContextBench: A Benchmark for Context Retrieval in Coding Agents Paper • 2602.05892 • Published Feb 5 • 4
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 17 days ago • 140
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published 16 days ago • 37