Scaling Speech Technology to 1,000+ Languages
Paper โข 2305.13516 โข Published โข 12
Fine-tuned version of facebook/mms-tts-adj
on the Eyaa-Tom dataset for Adja (adj).
Adja/Aja-Gbe. Fine-tuned from facebook/mms-tts-adj (closest MMS checkpoint to ISO ajg).
| Field | Value |
|---|---|
| Language | Adja |
| ISO 639-3 (MMS) | adj |
| Your ISO | ajg |
| Region | Togo/Benin |
| Family | Gbe (Niger-Congo) |
| Base model | facebook/mms-tts-adj |
| Metric | Value |
|---|---|
| Training samples | 5 |
| Validation samples | 1 |
| Best validation mel-L1 | 3.3801 |
| Uploaded variant | best |
from transformers import VitsModel, VitsTokenizer
import torch, torchaudio
model = VitsModel.from_pretrained("Umbaji/eyaa-tom-mms-tts-adj")
tokenizer = VitsTokenizer.from_pretrained("Umbaji/eyaa-tom-mms-tts-adj")
inputs = tokenizer("your text here", return_tensors="pt")
with torch.no_grad():
waveform = model(**inputs).waveform[0]
torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate)
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel et al.},
journal={arXiv preprint arXiv:2305.13516},
year={2023}
}
Fine-tuned: 2026-02-25 โ Eyaa-Tom project
Base model
facebook/mms-tts-adj