multilingual-tts
/

EveryVoice-OpenBible-Hiligaynon

+---
+language:
+  - hil
+license: cc-by-sa-4.0
+library_name: everyvoice
+tags:
+  - text-to-speech
+  - tts
+  - everyvoice
+  - fastspeech2
+  - open-bible
+  - hiligaynon
+pipeline_tag: text-to-speech
+datasets:
+  - davidguzmanr/open-bible-resources
+inference: false
+---
+# EveryVoice Open Bible — Hiligaynon
+A multispeaker text-to-speech model for **Hiligaynon**, trained from scratch on
+the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
+corpus using the [EveryVoice](https://github.com/EveryVoiceTTS/EveryVoice) TTS toolkit
+(FastSpeech2 acoustic model + HiFi-GAN vocoder, 22,050 Hz output).
+The model is conditioned on speaker embeddings learned during training. A speaker
+name from the training set must be supplied at inference time.
+## Files
+| File | Purpose |
+|------|---------|
+| `feature_prediction.ckpt` | Trained FastSpeech2 feature-prediction weights. |
+| `vocoder.ckpt` | HiFi-GAN vocoder checkpoint (optional — can be replaced with a universal vocoder). |
+| `config/` | EveryVoice YAML config files (shared data, text, feature-prediction, spec-to-wav). |
+| `filelist.psv` | Pipe-separated training filelist (`basename|language|speaker|characters|phones`). |
+## Intended use
+- Multispeaker TTS for Hiligaynon using one of the training-set speaker voices.
+- Research on multilingual TTS, low-resource TTS evaluation, and listening
+  studies on Open Bible–style read-speech.
+## How to use
+Install EveryVoice:
+```bash
+pip install everyvoice
+```
+Download the checkpoint and run inference:
+```python
+import torch
+from pathlib import Path
+from huggingface_hub import snapshot_download
+from everyvoice.config.type_definitions import DatasetTextRepresentation
+from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.synthesize import (
+    get_global_step,
+    synthesize_helper,
+)
+from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.model import FastSpeech2
+from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.type_definitions import (
+    SynthesizeOutputFormats,
+)
+from everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.utils import (
+    load_hifigan_from_checkpoint,
+)
+from everyvoice.utils.heavy import get_device_from_accelerator
+repo_id  = "multilingual-tts/EveryVoice-OpenBible-Hiligaynon"
+local    = Path(snapshot_download(repo_id))
+ckpt_path    = local / "feature_prediction.ckpt"
+vocoder_path = local / "vocoder.ckpt"
+accelerator = "gpu" if torch.cuda.is_available() else "cpu"
+device = get_device_from_accelerator(accelerator)
+model = FastSpeech2.load_from_checkpoint(str(ckpt_path)).to(device)
+model.eval()
+global_step = get_global_step(ckpt_path)
+vocoder_ckpt = torch.load(str(vocoder_path), map_location=device, weights_only=True)
+vocoder_model, vocoder_config = load_hifigan_from_checkpoint(vocoder_ckpt, device)
+vocoder_global_step = get_global_step(vocoder_path)
+# Pick any speaker from the model
+speaker = next(iter(model.speaker2id.keys()))
+language = next(iter(model.lang2id.keys()))
+print(f"Available speakers: {list(model.speaker2id.keys())}")
+filelist_data = [
+    {
+        "basename":         "sample-0",
+        "characters":       "...",   # text to synthesise in Hiligaynon
+        "language":         language,
+        "speaker":          speaker,
+        "duration_control": 1.0,
+    }
+]
+output_dir = Path("everyvoice_output")
+output_dir.mkdir(exist_ok=True)
+synthesize_helper(
+    model=model,
+    texts=None,
+    style_reference=None,
+    language=None,
+    speaker=None,
+    duration_control=1.0,
+    global_step=global_step,
+    output_type=[SynthesizeOutputFormats.wav],
+    text_representation=DatasetTextRepresentation.characters,
+    accelerator=accelerator,
+    devices="auto",
+    device=device,
+    batch_size=1,
+    num_workers=1,
+    filelist=None,
+    filelist_data=filelist_data,
+    output_dir=output_dir,
+    teacher_forcing_directory=None,
+    vocoder_model=vocoder_model,
+    vocoder_config=vocoder_config,
+    vocoder_global_step=vocoder_global_step,
+)
+# Generated WAVs land in output_dir/wav/
+```
+## Training data
+- **Source:** `davidguzmanr/open-bible-resources`, config `Hiligaynon`
+- **Size:** approximately 18,573 utterances
+- **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
+  voices and selected by name at inference time
+- **Sample rate:** 22,050 Hz
+## Training procedure
+- Acoustic model: FastSpeech2 (non-autoregressive, duration-prediction based).
+- Vocoder: HiFi-GAN (iSTFT variant).
+- Character-level tokenizer built from the training transcripts.
+- Trained with the [EveryVoice](https://github.com/EveryVoiceTTS/EveryVoice) toolkit.
+Audio preprocessing and training are reproducible via the upstream
+[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.
+## Evaluation
+Evaluated alongside other Open-Bible TTS systems on character/word error rate
+(via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the
+[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
+for the evaluation pipeline and the
+[open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
+for the human-listening survey methodology.