Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation Paper • 2511.17290 • Published Nov 21, 2025 • 1
🇪🇪 Estonian LLM Evaluation Collection A collection of resources for evaluation of LLM capabilities in the Estonian language. • 33 items • Updated Dec 13, 2025 • 5
Multilingual Benchmarks Collection Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets (ACL 2026) • 29 items • Updated 8 days ago • 2
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets Paper • 2602.22207 • Published Feb 25 • 43
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 89
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 44
Jupyter Agent Collection Blog: https://huggingface.co/blog/jupyter-agent-2 • 4 items • Updated Sep 12, 2025 • 3
MamayLM-v1.0-Gemma-3 Collection First Open and Multimodal Ukrainian-focused LLM • 5 items • Updated Oct 8, 2025 • 18
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 Sep 11, 2025 • 186
Apertus LLM Collection Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1, 2025 • 346
view article Article Announcing UA-Code-Bench: a New Benchmark for Evaluating LLMs on Competitive Programming Tasks in Ukrainian Jul 12, 2025 • 2
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 Aug 8, 2025 • 97
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages Jul 8, 2025 • 35