weired refrence text also added in audio

#14

by 0xGURU - opened 13 days ago

•

from omnivoice import OmniVoice
import torch
import torchaudio

# Custom paths
model_path = "F:\\PYTHON_PORTABLE\\OmniVoice"   # local model directory
ref_audio_path = "F:\\PYTHON_PORTABLE\\voice_preview_freya_valley_girl.mp3"       # reference audio path
output_path = "F:\\PYTHON_PORTABLE\\out.wav"          # output file path

# Load the model from local path
model = OmniVoice.from_pretrained(
    model_path,
    device_map="cuda:0",
    dtype=torch.float16
)

# Generate audio
audio = model.generate(
    text="In the early 2000s, the company invested over a billion dollars to develop CUDA, a software platform and API that enabled GPUs to run massively parallel programs for a broad range of compute-intensive applications.",
    ref_audio=ref_audio_path,
    ref_text="Transcription of the reference audio.",
)

# Save output
torchaudio.save(output_path, audio[0], 24000)

(venv3129_OmniVoice) F:\PYTHON_PORTABLE>python OmniVoice.py
F:\PYTHON_PORTABLE\venv3129_OmniVoice\Lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 313/313 [00:00<00:00, 1001.83it/s]
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 527/527 [00:00<00:00, 2108.49it/s]

(venv3129_OmniVoice) F:\PYTHON_PORTABLE>

zhu-han

k2-fsa org 12 days ago

"Transcription of the reference audio" is a placeholder for the transcription corresponding to your ref_audio_path and should be updated accordingly. If you prefer not to enter it manually, you can omit ref_text and only pass text and ref_audio. The model will then automatically load the Whisper ASR model to transcribe the ref_audio and obtain the ref_text.

0xGURU changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment