SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 21 days ago • 29
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 25 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 25 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 25 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 25 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 25 days ago • 125
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 25 days ago • 125
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated 28 days ago • 138
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated 28 days ago • 138
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated 28 days ago • 138
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated 28 days ago • 138
nick11roberts/co-emerge-overtrained-rw-params22M_maxstep5779-flop_4_00e17_step_5779 37M • Updated 28 days ago • 137