When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers Paper โข 2402.10601 โข Published Feb 16, 2024 โข 1
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software Paper โข 2509.25248 โข Published Sep 27, 2025 โข 3