From Perception to Action: An Interactive Benchmark for Vision Reasoning Paper • 2602.21015 • Published Feb 24 • 23
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Paper • 2411.00918 • Published Nov 1, 2024 • 9
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Paper • 2411.00918 • Published Nov 1, 2024 • 9
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs Paper • 2410.01999 • Published Oct 2, 2024 • 10
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs Paper • 2410.01999 • Published Oct 2, 2024 • 10