weired refrence text also added in audio

#14
by 0xGURU - opened
from omnivoice import OmniVoice
import torch
import torchaudio

# Custom paths
model_path = "F:\\PYTHON_PORTABLE\\OmniVoice"   # local model directory
ref_audio_path = "F:\\PYTHON_PORTABLE\\voice_preview_freya_valley_girl.mp3"       # reference audio path
output_path = "F:\\PYTHON_PORTABLE\\out.wav"          # output file path

# Load the model from local path
model = OmniVoice.from_pretrained(
    model_path,
    device_map="cuda:0",
    dtype=torch.float16
)

# Generate audio
audio = model.generate(
    text="In the early 2000s, the company invested over a billion dollars to develop CUDA, a software platform and API that enabled GPUs to run massively parallel programs for a broad range of compute-intensive applications.",
    ref_audio=ref_audio_path,
    ref_text="Transcription of the reference audio.",
)

# Save output
torchaudio.save(output_path, audio[0], 24000)

(venv3129_OmniVoice) F:\PYTHON_PORTABLE>python OmniVoice.py
F:\PYTHON_PORTABLE\venv3129_OmniVoice\Lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 313/313 [00:00<00:00, 1001.83it/s]
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 527/527 [00:00<00:00, 2108.49it/s]

(venv3129_OmniVoice) F:\PYTHON_PORTABLE>
k2-fsa org

"Transcription of the reference audio" is a placeholder for the transcription corresponding to your ref_audio_path and should be updated accordingly. If you prefer not to enter it manually, you can omit ref_text and only pass text and ref_audio. The model will then automatically load the Whisper ASR model to transcribe the ref_audio and obtain the ref_text.

0xGURU changed discussion status to closed

Sign up or log in to comment