Xiangyi Li's picture

Xiangyi Li PRO

xdotli

·

https://www.xiangyi.li

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

upvoted a paper 7 days ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

authored a paper 7 days ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

View all activity

Organizations

upvoted a paper 6 days ago

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Paper • 2604.05333 • Published 9 days ago • 21

upvoted a paper 7 days ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Paper • 2604.05172 • Published 10 days ago • 24

upvoted a paper 14 days ago

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Paper • 2603.01562 • Published Mar 2 • 63

upvoted 2 papers about 1 month ago

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

Paper • 2603.09827 • Published Mar 10 • 30

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 82

upvoted 2 papers about 2 months ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published Jan 17 • 35

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Paper • 2510.02209 • Published Oct 2, 2025 • 57

upvoted a collection about 2 months ago

SkillsBench

1 item • Updated Feb 17 • 1

upvoted a paper about 2 months ago

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Paper • 2602.12670 • Published Feb 13 • 59

upvoted 2 papers about 1 year ago

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Paper • 2503.02003 • Published Mar 3, 2025 • 48