Twi (Akan) VITS TTS Model

This is a VITS TTS model for Twi (Akan), trained from scratch on the WaxalNLP dataset.

Unlike fine-tuned models, this model learns Twi phonology naturally from the audio data, producing authentic Twi pronunciation.

Features

  • ✅ Natural Twi pronunciation (not English-accented)
  • ✅ Multi-speaker support (4 speakers)
  • ✅ High-quality 22050Hz audio
  • ✅ Handles Twi characters (ɛ, ɔ) correctly

Usage

from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.vits import Vits
from TTS.utils.audio import AudioProcessor
from TTS.tts.utils.text.tokenizer import TTSTokenizer

# Load config
config = VitsConfig()
config.load_json("config.json")

# Load model
ap = AudioProcessor.init_from_config(config)
tokenizer, config = TTSTokenizer.init_from_config(config)
model = Vits(config, ap, tokenizer)
model.load_checkpoint(config, "model.pth", eval=True)

# Generate speech
outputs = model.synthesize(
    text="ɛte sɛn? me ho yɛ.",
    config=config,
    speaker_id=0,
)
wav = outputs["wav"]

Training Details

  • Architecture: VITS (Variational Inference TTS)
  • Dataset: google/WaxalNLP (twi_tts subset)
  • Sample rate: 22050 Hz
  • Speakers: 4 unique speakers
  • Training steps: 150000

Citation

@misc{vits-twi-2024,
  title={Twi (Akan) VITS TTS Model},
  author={zirri23},
  year={2024},
  publisher={Hugging Face},
}
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train zirri23/vits-twi-waxalnlp