multilingual-tts
/

VITS-OpenBible-Arabic-Standard

+---
+language:
+  - ar
+license: cc-by-sa-4.0
+library_name: coqui-tts
+tags:
+  - text-to-speech
+  - tts
+  - vits
+  - open-bible
+  - arabic-standard
+pipeline_tag: text-to-speech
+datasets:
+  - davidguzmanr/open-bible-resources
+inference: false
+---
+# VITS Open Bible — Arabic Standard
+A multispeaker text-to-speech model for **Arabic Standard**, trained from scratch on
+the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
+corpus using the [VITS](https://arxiv.org/abs/2106.06103) architecture
+(end-to-end TTS with adversarial learning, 22,050 Hz output) via the
+[Coqui TTS](https://github.com/coqui-ai/TTS) framework.
+Unlike zero-shot TTS models, VITS is conditioned on speaker embeddings learned
+during training. A speaker name from the training set must be supplied at
+inference time.
+## Files
+| File | Purpose |
+|------|---------|
+| `model_last.pth` | Trained model weights. |
+| `config.json` | Coqui TTS model configuration. |
+| `speakers.pth` | Speaker ID → embedding mapping. |
+## Intended use
+- Multispeaker TTS for Arabic Standard using one of the training-set speaker voices.
+- Research on multilingual TTS, low-resource TTS evaluation, and listening
+  studies on Open Bible–style read-speech.
+## How to use
+Install Coqui TTS:
+```bash
+pip install TTS
+```
+Download the checkpoint and run inference:
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from TTS.tts.utils.speakers import SpeakerManager
+from TTS.utils.synthesizer import Synthesizer
+repo_id  = "multilingual-tts/VITS-OpenBible-Arabic-Standard"
+ckpt     = hf_hub_download(repo_id, "model_last.pth")
+config   = hf_hub_download(repo_id, "config.json")
+speakers = hf_hub_download(repo_id, "speakers.pth")
+use_cuda = torch.cuda.is_available()
+synthesizer = Synthesizer(
+    tts_checkpoint=ckpt,
+    tts_config_path=config,
+    tts_speakers_file=speakers,
+    use_cuda=use_cuda,
+)
+# Coqui's Synthesizer may not inject the speakers file into the model config
+# automatically — restore the SpeakerManager manually when needed.
+if synthesizer.tts_model.speaker_manager is None:
+    synthesizer.tts_model.speaker_manager = SpeakerManager(
+        speaker_id_file_path=speakers
+    )
+# List available speaker names
+print(sorted(synthesizer.tts_model.speaker_manager.speaker_names))
+wav = synthesizer.tts(
+    text="...",          # text to synthesise in Arabic Standard
+    speaker_name="...",  # one of the speaker names printed above
+    split_sentences=True,
+)
+```
+## Training data
+- **Source:** `davidguzmanr/open-bible-resources`, config `Arabic Standard`
+- **Size:** approximately 25,262 utterances
+- **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
+  voices and selected by name at inference time
+- **Sample rate:** 22,050 Hz
+## Training procedure
+- Architecture: VITS (Conditional Variational Autoencoder + adversarial training).
+- Grapheme-level tokenizer, built from the training transcripts.
+- Optimizer: AdamW, learning rate 2e-4.
+- Training budget: 500,000 optimizer updates on 2 GPUs with mixed precision
+  (bf16).
+Audio preprocessing and training are reproducible via the upstream
+[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.
+## Evaluation
+Evaluated alongside other Open-Bible TTS systems on character/word error rate
+(via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the
+[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
+for the evaluation pipeline and the
+[open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
+for the human-listening survey methodology.