SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 7 days ago • 276
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation Paper • 2603.09723 • Published Mar 10 • 7
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation Paper • 2603.09723 • Published Mar 10 • 7
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published Feb 5 • 12
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published Feb 5 • 12
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published Feb 5 • 12
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 6
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 6
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 6 • 2
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7, 2025 • 3
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7, 2025 • 3
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46