Twi (Akan) VITS TTS Model
This is a VITS TTS model for Twi (Akan), trained from scratch on the WaxalNLP dataset.
Unlike fine-tuned models, this model learns Twi phonology naturally from the audio data, producing authentic Twi pronunciation.
Features
- ✅ Natural Twi pronunciation (not English-accented)
- ✅ Multi-speaker support (4 speakers)
- ✅ High-quality 22050Hz audio
- ✅ Handles Twi characters (ɛ, ɔ) correctly
Usage
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.vits import Vits
from TTS.utils.audio import AudioProcessor
from TTS.tts.utils.text.tokenizer import TTSTokenizer
# Load config
config = VitsConfig()
config.load_json("config.json")
# Load model
ap = AudioProcessor.init_from_config(config)
tokenizer, config = TTSTokenizer.init_from_config(config)
model = Vits(config, ap, tokenizer)
model.load_checkpoint(config, "model.pth", eval=True)
# Generate speech
outputs = model.synthesize(
text="ɛte sɛn? me ho yɛ.",
config=config,
speaker_id=0,
)
wav = outputs["wav"]
Training Details
- Architecture: VITS (Variational Inference TTS)
- Dataset: google/WaxalNLP (twi_tts subset)
- Sample rate: 22050 Hz
- Speakers: 4 unique speakers
- Training steps: 150000
Citation
@misc{vits-twi-2024,
title={Twi (Akan) VITS TTS Model},
author={zirri23},
year={2024},
publisher={Hugging Face},
}
- Downloads last month
- 6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support