Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated Mar 25, 2025 • 66
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts Paper • 2604.12978 • Published 3 days ago • 5
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations Paper • 2402.17152 • Published Feb 27, 2024 • 6
GlotSuite Collection GlotSuite: Paving the Way for Bringing Generative AI to Underserved Communities • 17 items • Updated 2 days ago • 3
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments Paper • 2404.07982 • Published Apr 11, 2024 • 1
Challenging the Evaluator: LLM Sycophancy Under User Rebuttal Paper • 2509.16533 • Published Sep 20, 2025 • 1
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published Mar 11 • 153
Languages identification Collection a variety of pre-trained language identification models • 9 items • Updated Jul 31, 2025 • 2
OLDI and friends Collection This collection groups the datasets that have been featured as part of WMT’s Open Language Data Initiative shared task. • 5 items • Updated 23 days ago • 5
Insights from the ICLR Peer Review and Rebuttal Process Paper • 2511.15462 • Published Nov 19, 2025 • 7
mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 53
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs Paper • 2510.09871 • Published Oct 10, 2025 • 3
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs Paper • 2508.10142 • Published Aug 13, 2025 • 3
view changelog Hugging Face Changelog Connect Your MCP Client to the Hugging Face Hub Jun 6, 2025 • 114