Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization Paper • 2604.09574 • Published Feb 24 • 26
CoreCodeBench Collection dataset for CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark • 2 items • Updated May 16, 2025
CoreCodeBench Collection dataset for CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark • 2 items • Updated May 16, 2025
CoreCodeBench Collection dataset for CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark • 2 items • Updated May 16, 2025