Exploration and Exploitation Errors Are Measurable for Language Model Agents
Paper • 2604.13151 • Published • 24
None defined yet.
Exploration and Exploitation Errors Are Measurable for Language Model Agents
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks