You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

JALAK - Indonesian Text-to-Speech Model

Fine-tuned VITS model for Indonesian TTS using the X-lord Indonesian dataset.

Model Details

  • Base Model: Wikidepia VITS
  • Dataset: X-lord/Dataset-Text-To-Speech-Indonesia (4,531 samples, 16.38 hours)
  • Training: 1000 epochs
  • Sample Rate: 22050 Hz
  • Language: Indonesian

Usage

import torch
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.vits import Vits
import soundfile as sf

# Load config
config = VitsConfig()
config.load_json('config.json')
config.speakers_file = None
config.use_speaker_embedding = False
config.num_speakers = 0

# Load model
model = Vits.init_from_config(config)
model.load_checkpoint(config, 'best_model.pth')
model.eval()

if torch.cuda.is_available():
    model = model.cuda()

# Generate speech
text = "Selamat pagi, bagaimana kabar Anda?"
token_ids = model.tokenizer.text_to_ids(text)
token_ids = torch.LongTensor(token_ids).unsqueeze(0)

if torch.cuda.is_available():
    token_ids = token_ids.cuda()

with torch.no_grad():
    outputs = model.inference(token_ids)
    waveform = outputs['model_outputs'].squeeze().cpu().numpy()

sf.write('output.wav', waveform, 22050)

Training Metrics

Metric Value
Loss disc 2.56
Loss gen 2.15
Loss mel 15.22
Epochs 1000

License

MIT License

Credits

  • Base model: Wikidepia Indonesian TTS
  • Dataset: X-lord/Dataset-Text-To-Speech-Indonesia
  • Framework: Coqui TTS
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support