Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks Paper • 2602.05066 • Published Feb 4 • 1
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 22
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish Paper • 2407.12402 • Published Jul 17, 2024 • 1
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages Paper • 2502.11020 • Published Feb 16, 2025 • 8
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 32
aLLMA models Collection aLLMA small, base, and large models • 3 items • Updated Jul 20, 2024 • 3
Azerbaijani Datasets from Community Collection Datasets for Azerbaijani - azj, azb • 23 items • Updated Mar 2 • 4