Text-to-Speech
Tamil
coqui-tts
tts
vits
open-bible
tamil
davidguzmanr commited on
Commit
1284bfa
·
verified ·
1 Parent(s): bf25974

Add README for Tamil

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ta
4
+ license: cc-by-sa-4.0
5
+ library_name: coqui-tts
6
+ tags:
7
+ - text-to-speech
8
+ - tts
9
+ - vits
10
+ - open-bible
11
+ - tamil
12
+ pipeline_tag: text-to-speech
13
+ datasets:
14
+ - davidguzmanr/open-bible-resources
15
+ inference: false
16
+ ---
17
+
18
+ # VITS Open Bible — Tamil
19
+
20
+ A multispeaker text-to-speech model for **Tamil**, trained from scratch on
21
+ the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
22
+ corpus using the [VITS](https://arxiv.org/abs/2106.06103) architecture
23
+ (end-to-end TTS with adversarial learning, 22,050 Hz output) via the
24
+ [Coqui TTS](https://github.com/coqui-ai/TTS) framework.
25
+
26
+ Unlike zero-shot TTS models, VITS is conditioned on speaker embeddings learned
27
+ during training. A speaker name from the training set must be supplied at
28
+ inference time.
29
+
30
+ ## Files
31
+
32
+ | File | Purpose |
33
+ |------|---------|
34
+ | `model_last.pth` | Trained model weights. |
35
+ | `config.json` | Coqui TTS model configuration. |
36
+ | `speakers.pth` | Speaker ID → embedding mapping. |
37
+
38
+ ## Intended use
39
+
40
+ - Multispeaker TTS for Tamil using one of the training-set speaker voices.
41
+ - Research on multilingual TTS, low-resource TTS evaluation, and listening
42
+ studies on Open Bible–style read-speech.
43
+
44
+ ## How to use
45
+
46
+ Install Coqui TTS:
47
+
48
+ ```bash
49
+ pip install TTS
50
+ ```
51
+
52
+ Download the checkpoint and run inference:
53
+
54
+ ```python
55
+ import torch
56
+ from huggingface_hub import hf_hub_download
57
+ from TTS.tts.utils.speakers import SpeakerManager
58
+ from TTS.utils.synthesizer import Synthesizer
59
+
60
+ repo_id = "multilingual-tts/VITS-OpenBible-Tamil"
61
+ ckpt = hf_hub_download(repo_id, "model_last.pth")
62
+ config = hf_hub_download(repo_id, "config.json")
63
+ speakers = hf_hub_download(repo_id, "speakers.pth")
64
+
65
+ use_cuda = torch.cuda.is_available()
66
+ synthesizer = Synthesizer(
67
+ tts_checkpoint=ckpt,
68
+ tts_config_path=config,
69
+ tts_speakers_file=speakers,
70
+ use_cuda=use_cuda,
71
+ )
72
+
73
+ # Coqui's Synthesizer may not inject the speakers file into the model config
74
+ # automatically — restore the SpeakerManager manually when needed.
75
+ if synthesizer.tts_model.speaker_manager is None:
76
+ synthesizer.tts_model.speaker_manager = SpeakerManager(
77
+ speaker_id_file_path=speakers
78
+ )
79
+
80
+ # List available speaker names
81
+ print(sorted(synthesizer.tts_model.speaker_manager.speaker_names))
82
+
83
+ wav = synthesizer.tts(
84
+ text="...", # text to synthesise in Tamil
85
+ speaker_name="...", # one of the speaker names printed above
86
+ split_sentences=True,
87
+ )
88
+ ```
89
+
90
+ ## Training data
91
+
92
+ - **Source:** `davidguzmanr/open-bible-resources`, config `Tamil`
93
+ - **Size:** approximately 23,532 utterances
94
+ - **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
95
+ voices and selected by name at inference time
96
+ - **Sample rate:** 22,050 Hz
97
+
98
+ ## Training procedure
99
+
100
+ - Architecture: VITS (Conditional Variational Autoencoder + adversarial training).
101
+ - Grapheme-level tokenizer, built from the training transcripts.
102
+ - Optimizer: AdamW, learning rate 2e-4.
103
+ - Training budget: 500,000 optimizer updates on 2 GPUs with mixed precision
104
+ (bf16).
105
+
106
+ Audio preprocessing and training are reproducible via the upstream
107
+ [open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.
108
+
109
+ ## Evaluation
110
+
111
+ Evaluated alongside other Open-Bible TTS systems on character/word error rate
112
+ (via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the
113
+ [open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
114
+ for the evaluation pipeline and the
115
+ [open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
116
+ for the human-listening survey methodology.