CocoaBench: Evaluating Unified Digital Agents in the Wild Paper • 2604.11201 • Published 6 days ago • 33
CellMaster: Collaborative Cell Type Annotation in Single-Cell Analysis Paper • 2602.13346 • Published Feb 12 • 2
Steer2Edit: From Activation Steering to Component-Level Editing Paper • 2602.09870 • Published Feb 10 • 1
scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery Paper • 2602.11609 • Published Feb 12 • 2
StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors Paper • 2602.08934 • Published Feb 9
FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights Paper • 2602.02905 • Published Feb 2 • 5