ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
Paper • 2604.05172 • Published • 23
None defined yet.
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks