multilingual-tts
/

F5-TTS-OpenBible-Urdu

Model card Files Files and versions

luel commited on 5 days ago

Commit

78b7eca

·

verified ·

1 Parent(s): 9bcd96f

Update README.md

Files changed (1) hide show

README.md +20 -4

README.md CHANGED Viewed

@@ -52,22 +52,38 @@ pip install git+https://github.com/SWivid/F5-TTS.git
 Download the checkpoint and run inference:
 ```python
 from huggingface_hub import hf_hub_download
-from f5_tts.api import F5TTS
 repo_id = "multilingual-tts/F5-TTS-OpenBible-Urdu"
 ckpt   = hf_hub_download(repo_id, "model_last.pt")
 vocab  = hf_hub_download(repo_id, "vocab.txt")
 config = hf_hub_download(repo_id, "F5-TTS_OpenBible_Urdu.yaml")
-model = F5TTS(ckpt_file=ckpt, vocab_file=vocab, model_cfg=config)
 # Supply your own clean reference clip — 5–10 s, single speaker and its transcription.
 ref_audio = "/path/to/your-urdu-clip.wav"
 ref_text  = "Exact transcription of the clip"
 gen_text  = "..."   # text to synthesise in Urdu
-wav, sr, _ = model.infer(ref_audio=ref_audio, ref_text=ref_text, gen_text=gen_text)
 ```
 ## Training data
@@ -101,4 +117,4 @@ Evaluated alongside other Open-Bible TTS systems on character/word error rate
 [open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
 for the evaluation pipeline and the
 [open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
-for the human-listening survey methodology.

 Download the checkpoint and run inference:
 ```python
+import torch
 from huggingface_hub import hf_hub_download
+from hydra.utils import get_class
+from omegaconf import OmegaConf
+from f5_tts.infer.utils_infer import infer_process, load_model, load_vocoder, preprocess_ref_audio_text
 repo_id = "multilingual-tts/F5-TTS-OpenBible-Urdu"
 ckpt   = hf_hub_download(repo_id, "model_last.pt")
 vocab  = hf_hub_download(repo_id, "vocab.txt")
 config = hf_hub_download(repo_id, "F5-TTS_OpenBible_Urdu.yaml")
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model_cfg = OmegaConf.load(config)
+model_cls = get_class(f"f5_tts.model.{model_cfg.model.backbone}")
+vocoder = load_vocoder(vocoder_name="vocos", is_local=False, device=device)
+model   = load_model(
+    model_cls, model_cfg.model.arch, ckpt,
+    mel_spec_type="vocos", vocab_file=vocab, use_ema=True, device=device,
+)
 # Supply your own clean reference clip — 5–10 s, single speaker and its transcription.
 ref_audio = "/path/to/your-urdu-clip.wav"
 ref_text  = "Exact transcription of the clip"
 gen_text  = "..."   # text to synthesise in Urdu
+ref_audio_proc, ref_text_proc = preprocess_ref_audio_text(ref_audio, ref_text)
+wav, sr, _ = infer_process(
+    ref_audio_proc, ref_text_proc, gen_text, model, vocoder,
+    mel_spec_type="vocos", device=device,
+)
 ```
 ## Training data
 [open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
 for the evaluation pipeline and the
 [open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
+for the human-listening survey methodology.