WAXAL MMS-TTS — Luo (luo)
Fine-tuning-ready checkpoint for Luo (luo).
| WAXAL dataset config | google/WaxalNLP — luo_tts |
| Data provider | Loud and Clear |
| WAXAL data license | CC-BY-SA-4.0 |
| Base model | facebook/mms-tts-ach |
| Model license | CC-BY-NC 4.0 (MMS base; governs fine-tuned model) |
⚠️ Proxy checkpoint
facebook/mms-tts-luo does not exist in MMS-TTS coverage (1107 languages). This repository fine-tunes from the closest available linguistic donor:
| Proxy donor | facebook/mms-tts-ach |
| Ranked alternatives | ach, nyn, lug |
| Other WAXAL languages sharing this donor | none |
Proximity rationale: Luo (luo) has no MMS checkpoint. Proxy: ach (Acholi; both are Western Nilotic, Nilo-Saharan — same subgroup with closely related phonology, tone system, and lexicon). Secondary: nyn (Nyankole, Bantu, geographically co-located by Loud and Clear). Tertiary: lug (Luganda, Bantu, same recording provider).
Each recipient language is fine-tuned independently from the same donor base. Donor weights provide acoustic/prosodic warm-start; WAXAL fine-tuning adapts them to the target language.
What this repository adds
facebook/mms-tts-* Hub checkpoints are inference-only releases that crash run_vits_finetuning.py. This repository applies three patches:
| File | Change |
|---|---|
config.json |
pad_token_id set to 0 (was null) |
tokenizer_config.json |
pad_token entry added |
preprocessor_config.json |
Added — VitsFeatureExtractor config from ylacombe/mms-tts-eng-train |
Model weights are not stored here.
_name_or_pathinconfig.jsonpoints tofacebook/mms-tts-ach, sorun_vits_finetuning.pyloads weights from that checkpoint at training time.
preprocessor_config.json
Downloaded verbatim from ylacombe/mms-tts-eng-train.
Values are VITS architecture constants shared by all MMS-TTS languages.
| Field | Value |
|---|---|
feature_extractor_type |
VitsFeatureExtractor |
feature_size |
80 |
hop_length |
256 |
max_wav_value |
32768.0 |
n_fft |
1024 |
padding_side |
right |
padding_value |
0.0 |
return_attention_mask |
False |
sampling_rate |
16000 |
spec_gain |
1 |
Usage in finetune-hf-vits
{
"model_name_or_path": "rnjema-unima/mms-tts-luo-baseline",
"feature_extractor_name": "rnjema-unima/mms-tts-luo-baseline",
"dataset_name": "google/WaxalNLP",
"dataset_config_name": "luo_tts",
"audio_column_name": "audio",
"text_column_name": "text",
"train_split_name": "train",
"eval_split_name": "validation"
}
Inference (after fine-tuning)
from transformers import VitsModel, VitsTokenizer
import torch, scipy
model = VitsModel.from_pretrained("your-org/your-finetuned-model")
tokenizer = VitsTokenizer.from_pretrained("your-org/your-finetuned-model")
inputs = tokenizer("Your text in Luo.", return_tensors="pt")
with torch.no_grad():
out = model(**inputs)
scipy.io.wavfile.write("output.wav", model.config.sampling_rate,
out.waveform.squeeze().numpy())
Technical details
| Architecture | VITS (end-to-end, no separate vocoder) |
| MMS match type | proxy |
pad_token_id |
0 |
vocab_size |
28 |
is_uroman |
false |
sampling_rate |
16000 Hz |
References
BibTex Citation:
This model was developed by Vineel Pratap et al. from Meta AI. If you use the model, consider citing the MMS paper:
@article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli}, journal={arXiv}, year={2023} }
Notable Acknowledgements:
- Downloads last month
- 15
Model tree for rnjema-unima/mms-tts-luo-baseline
Base model
facebook/mms-tts-ach