Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Paper • 2604.11290 • Published 5 days ago • 2
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Paper • 2604.11290 • Published 5 days ago • 2
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
Predicting the Order of Upcoming Tokens Improves Language Modeling Paper • 2508.19228 • Published Aug 26, 2025 • 23
Language Surgery in Multilingual Large Language Models Paper • 2506.12450 • Published Jun 14, 2025 • 16
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks Paper • 2410.18210 • Published Oct 23, 2024
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29, 2025 • 31
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19, 2025 • 48
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10, 2025 • 101
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10, 2025 • 101
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18, 2025 • 19