JamboGPT Bot commited on
Commit
cba2f6d
Β·
1 Parent(s): 47e0d62

Add comprehensive language documentation

Browse files
Files changed (1) hide show
  1. LANGUAGES.md +255 -0
LANGUAGES.md ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🌍 JamboGPT - Supported Languages
2
+
3
+ JamboGPT now supports **10 African languages** covering East Africa, West Africa, and beyond!
4
+
5
+ ## πŸ“ Language Coverage Map
6
+
7
+ ### East Africa (5 Languages)
8
+
9
+ | Language | Region | Speakers | Code | Model |
10
+ |----------|--------|----------|------|-------|
11
+ | **Swahili** | Kenya, Tanzania, Uganda | ~100M+ | swh | facebook/mms-tts-swh |
12
+ | **Kikuyu** | Central Kenya | ~7M | ki | BrianMwangi/African-Kikuyu-TTS |
13
+ | **Luo** | Western Kenya | ~4M | luo | facebook/mms-tts-luo |
14
+ | **Luhya** | Western Kenya | ~5M | luy | facebook/mms-tts-luy |
15
+ | **Kamba** | Eastern Kenya | ~3M | kam | facebook/mms-tts-kam |
16
+
17
+ ### West Africa (3 Languages)
18
+
19
+ | Language | Region | Speakers | Code | Model |
20
+ |----------|--------|----------|------|-------|
21
+ | **Yoruba** | Nigeria | ~45M | yor | facebook/mms-tts-yor |
22
+ | **Igbo** | Nigeria | ~27M | ibo | facebook/mms-tts-ibo |
23
+ | **Hausa** | Nigeria, Niger | ~90M | hau | facebook/mms-tts-hau |
24
+
25
+ ### East Africa - Horn of Africa (1 Language)
26
+
27
+ | Language | Region | Speakers | Code | Model |
28
+ |----------|--------|----------|------|-------|
29
+ | **Amharic** | Ethiopia | ~32M | amh | facebook/mms-tts-amh |
30
+
31
+ ### Global (1 Language)
32
+
33
+ | Language | Region | Speakers | Code | Model |
34
+ |----------|--------|----------|------|-------|
35
+ | **English** | Global | 1.5B+ | eng | facebook/mms-tts-eng |
36
+
37
+ ---
38
+
39
+ ## 🎯 Language Details
40
+
41
+ ### Kenyan Languages (5)
42
+
43
+ #### 1. **Swahili** πŸ‡°πŸ‡ͺ πŸ‡ΉπŸ‡Ώ πŸ‡ΊπŸ‡¬
44
+ - **Native Speakers**: ~100 million+
45
+ - **Regions**: Kenya, Tanzania, Uganda, Democratic Republic of Congo
46
+ - **Script**: Latin alphabet
47
+ - **Status**: Official language in Kenya and Tanzania
48
+ - **Use Cases**: Business, education, media, government
49
+ - **Model**: Meta MMS TTS (facebook/mms-tts-swh)
50
+
51
+ #### 2. **Kikuyu** πŸ‡°πŸ‡ͺ
52
+ - **Native Speakers**: ~7 million
53
+ - **Regions**: Central Kenya (Mount Kenya region)
54
+ - **Script**: Latin alphabet
55
+ - **Status**: Widely spoken in Kiambu, Muranga, Nyeri counties
56
+ - **Use Cases**: Local media, education, community engagement
57
+ - **Model**: BrianMwangi/African-Kikuyu-TTS (fine-tuned MMS)
58
+
59
+ #### 3. **Luo** πŸ‡°πŸ‡ͺ
60
+ - **Native Speakers**: ~4 million
61
+ - **Regions**: Western Kenya (Kisumu, Siaya, Homa Bay, Migori)
62
+ - **Script**: Latin alphabet
63
+ - **Status**: Widely spoken in Nyanza region
64
+ - **Use Cases**: Local broadcasting, education, cultural preservation
65
+ - **Model**: Meta MMS TTS (facebook/mms-tts-luo)
66
+
67
+ #### 4. **Luhya** πŸ‡°πŸ‡ͺ
68
+ - **Native Speakers**: ~5 million
69
+ - **Regions**: Western Kenya (Kakamega, Bungoma, Vihiga)
70
+ - **Script**: Latin alphabet
71
+ - **Status**: Widely spoken in Western region
72
+ - **Use Cases**: Local media, education, community services
73
+ - **Model**: Meta MMS TTS (facebook/mms-tts-luy)
74
+
75
+ #### 5. **Kamba** πŸ‡°πŸ‡ͺ
76
+ - **Native Speakers**: ~3 million
77
+ - **Regions**: Eastern Kenya (Makueni, Kajiado, Taita-Taveta)
78
+ - **Script**: Latin alphabet
79
+ - **Status**: Widely spoken in Eastern region
80
+ - **Use Cases**: Local broadcasting, education, tourism
81
+ - **Model**: Meta MMS TTS (facebook/mms-tts-kam)
82
+
83
+ ---
84
+
85
+ ### Nigerian Languages (3)
86
+
87
+ #### 6. **Yoruba** πŸ‡³πŸ‡¬
88
+ - **Native Speakers**: ~45 million
89
+ - **Regions**: Nigeria (Southwest), Benin, Togo
90
+ - **Script**: Latin alphabet
91
+ - **Status**: Major West African language
92
+ - **Use Cases**: Media, entertainment, education, business
93
+ - **Model**: Meta MMS TTS (facebook/mms-tts-yor)
94
+
95
+ #### 7. **Igbo** πŸ‡³πŸ‡¬
96
+ - **Native Speakers**: ~27 million
97
+ - **Regions**: Nigeria (Southeast)
98
+ - **Script**: Latin alphabet
99
+ - **Status**: Major West African language
100
+ - **Use Cases**: Media, education, business, cultural content
101
+ - **Model**: Meta MMS TTS (facebook/mms-tts-ibo)
102
+
103
+ #### 8. **Hausa** πŸ‡³πŸ‡¬ πŸ‡³πŸ‡ͺ
104
+ - **Native Speakers**: ~90 million (including second language speakers)
105
+ - **Regions**: Nigeria, Niger, and across West Africa
106
+ - **Script**: Latin alphabet (also Arabic script)
107
+ - **Status**: Lingua franca of West Africa
108
+ - **Use Cases**: Trade, media, education, cross-border communication
109
+ - **Model**: Meta MMS TTS (facebook/mms-tts-hau)
110
+
111
+ ---
112
+
113
+ ### Ethiopian Language (1)
114
+
115
+ #### 9. **Amharic** πŸ‡ͺπŸ‡Ή
116
+ - **Native Speakers**: ~32 million
117
+ - **Regions**: Ethiopia (official language)
118
+ - **Script**: Ge'ez script
119
+ - **Status**: Official language of Ethiopia
120
+ - **Use Cases**: Government, education, media, business
121
+ - **Model**: Meta MMS TTS (facebook/mms-tts-amh)
122
+
123
+ ---
124
+
125
+ ### Global Language (1)
126
+
127
+ #### 10. **English** 🌍
128
+ - **Native Speakers**: ~370 million
129
+ - **Global Speakers**: 1.5+ billion
130
+ - **Regions**: Worldwide
131
+ - **Script**: Latin alphabet
132
+ - **Status**: International lingua franca
133
+ - **Use Cases**: Business, education, technology, global communication
134
+ - **Model**: Meta MMS TTS (facebook/mms-tts-eng)
135
+
136
+ ---
137
+
138
+ ## πŸ“Š Statistics
139
+
140
+ ### Total Coverage
141
+ - **Total Languages**: 10
142
+ - **Total Native Speakers**: 300+ million
143
+ - **Total Regions**: 4 (East Africa, West Africa, Horn of Africa, Global)
144
+ - **Countries**: Kenya, Nigeria, Niger, Tanzania, Uganda, Ethiopia, Benin, Togo, DRC
145
+
146
+ ### Regional Breakdown
147
+ - **East Africa**: 5 languages (Swahili, Kikuyu, Luo, Luhya, Kamba)
148
+ - **West Africa**: 3 languages (Yoruba, Igbo, Hausa)
149
+ - **Horn of Africa**: 1 language (Amharic)
150
+ - **Global**: 1 language (English)
151
+
152
+ ---
153
+
154
+ ## πŸš€ Use Cases by Language
155
+
156
+ ### Education
157
+ - **Swahili**: Curriculum content, educational videos
158
+ - **Kikuyu, Luo, Luhya, Kamba**: Local language instruction
159
+ - **Yoruba, Igbo, Hausa**: Mother tongue education
160
+ - **Amharic**: Ethiopian education system
161
+ - **English**: International education
162
+
163
+ ### Media & Entertainment
164
+ - **Swahili**: News, podcasts, music
165
+ - **Kikuyu, Luo, Luhya, Kamba**: Local radio, community content
166
+ - **Yoruba, Igbo, Hausa**: Entertainment, music, drama
167
+ - **Amharic**: Ethiopian media
168
+ - **English**: Global media
169
+
170
+ ### Business & Commerce
171
+ - **Swahili**: East African business communication
172
+ - **Hausa**: West African trade and commerce
173
+ - **English**: International business
174
+ - **All others**: Local business and commerce
175
+
176
+ ### Accessibility
177
+ - **All languages**: Making technology accessible to speakers
178
+ - **Visually impaired users**: Audio content
179
+ - **Language learners**: Pronunciation and learning
180
+
181
+ ### Cultural Preservation
182
+ - **All languages**: Documenting and preserving languages
183
+ - **Digital archives**: Recording language content
184
+ - **Community engagement**: Supporting language communities
185
+
186
+ ---
187
+
188
+ ## πŸ”§ Technical Details
189
+
190
+ ### Model Architecture
191
+ All models use **Meta's Massively Multilingual Speech (MMS)** architecture, except Kikuyu which uses a fine-tuned version.
192
+
193
+ ### Audio Output
194
+ - **Sample Rate**: 16 kHz
195
+ - **Bit Depth**: 16-bit
196
+ - **Format**: WAV
197
+ - **Quality**: Natural-sounding speech synthesis
198
+
199
+ ### Performance
200
+ - **Inference Time**: 2-5 seconds per 100 words (CPU)
201
+ - **GPU Acceleration**: <1 second (with CUDA)
202
+ - **Max Text Length**: 1000 characters per request
203
+
204
+ ---
205
+
206
+ ## 🌟 Future Expansions
207
+
208
+ Planned languages for future versions:
209
+ - **Chichewa** (Malawi, Zambia)
210
+ - **Xhosa** (South Africa)
211
+ - **Zulu** (South Africa)
212
+ - **Twi/Akan** (Ghana)
213
+ - **Wolof** (Senegal)
214
+ - **Somali** (Somalia)
215
+ - **Tigrinya** (Eritrea, Ethiopia)
216
+ - **Oromo** (Ethiopia)
217
+ - **Afar** (Ethiopia, Eritrea, Djibouti)
218
+ - And more...
219
+
220
+ ---
221
+
222
+ ## πŸ“š Resources
223
+
224
+ ### Language Information
225
+ - [Ethnologue](https://www.ethnologue.com/) - Comprehensive language database
226
+ - [Wikipedia Language Lists](https://en.wikipedia.org/wiki/Languages_of_Africa) - African languages overview
227
+ - [Glottolog](https://glottolog.org/) - Language classification
228
+
229
+ ### Models & Datasets
230
+ - [Meta MMS](https://huggingface.co/facebook/mms-tts-swh) - Multilingual speech models
231
+ - [Hugging Face Hub](https://huggingface.co/models?language=african-languages) - African language models
232
+ - [Masakhane](https://www.masakhane.io/) - African NLP community
233
+
234
+ ### Communities
235
+ - [CLEAR Global](https://www.clear-global.org/) - African language AI research
236
+ - [Sunbird AI](https://sunbird.ai/) - African language technology
237
+ - [Lelapa AI](https://www.lelapa.ai/) - African language models
238
+
239
+ ---
240
+
241
+ ## 🀝 Contributing
242
+
243
+ Want to add more languages or improve existing ones? We welcome contributions!
244
+
245
+ - Add new language models
246
+ - Improve audio quality
247
+ - Contribute datasets
248
+ - Report issues
249
+ - Suggest improvements
250
+
251
+ ---
252
+
253
+ **JamboGPT - Making AI accessible in African languages! 🌍**
254
+
255
+ *Last Updated: May 11, 2026*