Running 2 DITING Leaderboard 📊 2 Explore and compare translation models with interactive radar charts
MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment Paper • 2512.09636 • Published Dec 10, 2025 • 26
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper • 2510.09116 • Published Oct 10, 2025 • 97
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 149
Running 2 DITING Leaderboard 📊 2 Explore and compare translation models with interactive radar charts
Running 2 DITING Leaderboard 📊 2 Explore and compare translation models with interactive radar charts
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper • 2510.09116 • Published Oct 10, 2025 • 97 • 2
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper • 2510.09116 • Published Oct 10, 2025 • 97
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models Paper • 2508.13491 • Published Aug 19, 2025 • 59
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16, 2025 • 94
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications Paper • 2503.20990 • Published Mar 26, 2025 • 19