MICWEN PRO
cesear64
AI & ML interests
None yet
Recent Activity
repliedto their post about 1 hour ago
Just published: how we built production Sango (Central African Republic) translation without fine-tuning, parallel corpus, or training compute.
The method — vocabulary-augmented prompting with a 581-entry native-speaker-verified lexicon — generalizes to any of the ~2,000 African languages at the same data-poverty level. Recipe, dataset, and code template all included.
📄 Blog: https://huggingface.co/blog/MEYNG/sangoai
📦 Dataset: https://huggingface.co/datasets/MEYNG/sango-vocabulary
Would especially value feedback from anyone working on other low-resource African languages — Ewondo, Lingala, Wolof next on our roadmap. published an article about 2 hours ago
Scaling Zero-Resource Vocabulary: A Data Pipeline for Sango updated a model 1 day ago
MEYNG/nllb-sango-finetuned-600m