Just published: how we built production Sango (Central African Republic) translation without fine-tuning, parallel corpus, or training compute.
The method — vocabulary-augmented prompting with a 581-entry native-speaker-verified lexicon — generalizes to any of the ~2,000 African languages at the same data-poverty level. Recipe, dataset, and code template all included.
I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording. - HD resolution - 1280×720 · 32 fps - For each frame keyboard and mouse + world state (player position, velocity, weapon ...) - HD Stereo audio - All 10 players perspective