AmanPriyanshu/tool-reasoning-sft-CODING-MEnvData-SWE-Trajectory-data-cleaned-rectified Viewer • Updated Mar 7 • 3.92k • 32 • 1
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models Paper • 2509.24239 • Published Sep 29, 2025 • 5
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering Paper • 2601.22859 • Published Jan 30 • 18
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering Paper • 2601.22859 • Published Jan 30 • 18