Bhili TTS
Text-to-Speech model for Bhili (भीली), an Indo-Aryan language spoken by the Bhil people in western India.
This is a fine-tuned version of ai4bharat/indic-parler-tts, trained on ~2 hours of Bhili Conversational Speech data.
Try out the model here!
Installation
pip install git+https://github.com/huggingface/parler-tts.git
Inference
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("sanjay73/indic-parler-bhili-tts").to(device)
tokenizer = AutoTokenizer.from_pretrained("sanjay73/indic-parler-bhili-tts")
description_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
prompt = "चाला आपुऊ आमी बाजार केरा जाहूं"
description = "A male speaker delivers speech at a moderate speed with a moderate pitch. The recording is of good quality."
desc_ids = description_tokenizer(description, return_tensors="pt").to(device)
prompt_ids = tokenizer(prompt, return_tensors="pt").to(device)
generation = model.generate(
input_ids=desc_ids.input_ids,
attention_mask=desc_ids.attention_mask,
prompt_input_ids=prompt_ids.input_ids,
prompt_attention_mask=prompt_ids.attention_mask,
)
audio = generation.cpu().numpy().squeeze()
sf.write("output.wav", audio, model.config.sampling_rate)
- Downloads last month
- 48
Model tree for ai4bharat/bhili-tts
Base model
ai4bharat/indic-parler-tts