CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation Paper • 2505.24456 • Published May 30, 2025
A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script Paper • 2507.15142 • Published Jul 20, 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey Paper • 2504.14891 • Published Apr 21, 2025 • 1
Open, Closed, or Small Language Models for Text Classification? Paper • 2308.10092 • Published Aug 19, 2023
INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages Paper • 2502.09814 • Published Feb 13, 2025 • 1
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published Jun 10, 2024 • 6
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages Paper • 2411.16508 • Published Nov 25, 2024 • 10
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding Paper • 2411.05049 • Published Nov 7, 2024 • 4
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 7
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval) Paper • 2304.06845 • Published Apr 13, 2023
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages Paper • 2305.06897 • Published May 11, 2023 • 9
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages Paper • 2305.13989 • Published May 23, 2023
AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages Paper • 2311.09828 • Published Nov 16, 2023 • 1
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation Paper • 2103.08647 • Published Mar 15, 2021
MasakhaNER: Named Entity Recognition for African Languages Paper • 2103.11811 • Published Mar 22, 2021
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis Paper • 2201.08277 • Published Jan 20, 2022
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects Paper • 2309.07445 • Published Sep 14, 2023
Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning Paper • 2204.06487 • Published Apr 13, 2022