No audio output

by sunnyglow - opened Feb 20, 2025

Discussion

sunnyglow

Feb 20, 2025

Hi I tried the model in my local machine. Final audio output doesn't have any audio.

AshwinSankar

AI4Bharat org Feb 20, 2025

Would you be able to share a sample code that you're using?

sunnyglow

Feb 20, 2025

Below is the code. Interestingly same code works fine in linux machine. However in windows machine it is not. Attached audio was the output in windows machine.

import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "தொண்டை நாட்டுக்கும் சோழ நாட்டுக்கும் இடையில் உள்ள திருமுனைப்பாடி நாட்டின் தென்பகுதியில், தில்லைச் சிற்றம்பலத்துக்கு மேற்கே இரண்டு காததூரத்தில், அலை கடல் போன்ற ஓர் ஏரி விரிந்து பரந்து கிடக்கிறது." # Example text in Punjabi
speaker_id = 18 # PAN_M
style_id = 3 # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.detach().cpu().squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)

Sureshd3004

about 22 hours ago

import soundfile as sf
from transformers import AutoModel, AutoTokenizer, AutoConfig
from google.colab import userdata
from IPython.display import Audio
import torch

Retrieve token from secrets

hf_token = userdata.get("HF_TOKEN")
model_id = "ai4bharat/vits_rasa_13"
device = "cuda" if torch.cuda.is_available() else "cpu"

1. Load the configuration and fix missing pad_token_id

config = AutoConfig.from_pretrained(model_id, trust_remote_code=True, token=hf_token)
if not hasattr(config, "pad_token_id"):
config.pad_token_id = 0

2. Load model and tokenizer

model = AutoModel.from_pretrained(model_id, config=config, trust_remote_code=True, token=hf_token).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=hf_token)

3. Setup Punjabi text and parameters

text = "OCI விண்ணப்பக் கட்டணம் திருத்தப்பட்டது: விண்ணப்பதாரர்கள் தெரிந்து கொள்ள வேண்டியவை இதோ"
speaker_id = torch.tensor([18]).to(device) # PAN_M
style_id = torch.tensor([0]).to(device) # ALEXA

4. Inference

inputs = tokenizer(text=text, return_tensors="pt").to(device)
with torch.no_grad():
# Using model forward directly
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)

5. Save and Play

waveform = outputs.waveform.squeeze().cpu().numpy()
sf.write("audio.wav", waveform, model.config.sampling_rate)

print(f"Audio generated with shape: {waveform.shape}")
Audio(waveform, rate=model.config.sampling_rate)

ttry this in colab its working for me

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment