Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models Paper • 2604.01622 • Published 17 days ago • 7
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps Paper • 2510.13430 • Published Oct 15, 2025 • 1
3LM: Bridging Arabic, STEM, and Code through Benchmarking Paper • 2507.15850 • Published Jul 21, 2025 • 6
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9, 2025 • 2
Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation Paper • 2604.03395 • Published 16 days ago • 2
Contrastive Representation Learning: A Framework and Review Paper • 2010.05113 • Published Oct 10, 2020 • 1
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9, 2025 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards Paper • 2602.02555 • Published Jan 30 • 1
HateMirage: An Explainable Multi-Dimensional Dataset for Decoding Faux Hate and Subtle Online Abuse Paper • 2603.02684 • Published Mar 3 • 1
HateMirage: An Explainable Multi-Dimensional Dataset for Decoding Faux Hate and Subtle Online Abuse Paper • 2603.02684 • Published Mar 3 • 1
X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework Paper • 2601.03194 • Published Jan 6 • 2
X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework Paper • 2601.03194 • Published Jan 6 • 2
EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI Paper • 2509.11648 • Published Sep 15, 2025 • 2
D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning Paper • 2509.06771 • Published Sep 8, 2025 • 6