Scaling Speech Technology to 1,000+ Languages
Paper • 2305.13516 • Published • 12
Fine-tuned version of facebook/mms-tts-fra
on the Eyaa-Tom dataset for French (fra).
French as spoken in Togo/Benin context. Fine-tuned from facebook/mms-tts-fra.
| Field | Value |
|---|---|
| Language | French |
| ISO 639-3 (MMS) | fra |
| Your ISO | fra |
| Region | West Africa |
| Family | Romance (Indo-European) |
| Base model | facebook/mms-tts-fra |
| Metric | Value |
|---|---|
| Training samples | 46 |
| Validation samples | 9 |
| Best validation mel-L1 | 3.8487 |
| Uploaded variant | best |
from transformers import VitsModel, VitsTokenizer
import torch, torchaudio
model = VitsModel.from_pretrained("Umbaji001/eyaa-tom-mms-tts-fra")
tokenizer = VitsTokenizer.from_pretrained("Umbaji001/eyaa-tom-mms-tts-fra")
inputs = tokenizer("your text here", return_tensors="pt")
with torch.no_grad():
waveform = model(**inputs).waveform[0]
torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate)
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel et al.},
journal={arXiv preprint arXiv:2305.13516},
year={2023}
}
Fine-tuned: 2026-02-25 — Eyaa-Tom project
Base model
facebook/mms-tts-fra