EvoClaw: Evaluating AI Agents on Continuous Software Evolution
Paper • 2603.13428 • Published • 21
None defined yet.
ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs