Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
3LM: Bridging Arabic, STEM, and Code through Benchmarking Paper • 2507.15850 • Published Jul 21, 2025 • 6
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps Paper • 2510.13430 • Published Oct 15, 2025 • 1
Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation Paper • 2604.03395 • Published 12 days ago • 2