MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published 7 days ago • 95
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 9 days ago • 114
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 7 days ago • 253
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 50 items • Updated Mar 13 • 680