Can Large Language Models Reinvent Foundational Algorithms? Paper • 2604.05716 • Published 15 days ago • 7
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10, 2025 • 152
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning Paper • 2504.00891 • Published Apr 1, 2025 • 14
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
Can Large Language Models Reinvent Foundational Algorithms? Paper • 2604.05716 • Published 15 days ago • 7