davidguzmanr commited on
Commit
5e52c74
·
verified ·
1 Parent(s): 89b88e1

Add README for Ewe

Browse files
Files changed (1) hide show
  1. README.md +159 -0
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ee
4
+ license: cc-by-sa-4.0
5
+ library_name: everyvoice
6
+ tags:
7
+ - text-to-speech
8
+ - tts
9
+ - everyvoice
10
+ - fastspeech2
11
+ - open-bible
12
+ - ewe
13
+ pipeline_tag: text-to-speech
14
+ datasets:
15
+ - davidguzmanr/open-bible-resources
16
+ inference: false
17
+ ---
18
+
19
+ # EveryVoice Open Bible — Ewe
20
+
21
+ A multispeaker text-to-speech model for **Ewe**, trained from scratch on
22
+ the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
23
+ corpus using the [EveryVoice](https://github.com/EveryVoiceTTS/EveryVoice) TTS toolkit
24
+ (FastSpeech2 acoustic model + HiFi-GAN vocoder, 22,050 Hz output).
25
+
26
+ The model is conditioned on speaker embeddings learned during training. A speaker
27
+ name from the training set must be supplied at inference time.
28
+
29
+ ## Files
30
+
31
+ | File | Purpose |
32
+ |------|---------|
33
+ | `feature_prediction.ckpt` | Trained FastSpeech2 feature-prediction weights. |
34
+ | `vocoder.ckpt` | HiFi-GAN vocoder checkpoint (optional — can be replaced with a universal vocoder). |
35
+ | `config/` | EveryVoice YAML config files (shared data, text, feature-prediction, spec-to-wav). |
36
+ | `filelist.psv` | Pipe-separated training filelist (`basename|language|speaker|characters|phones`). |
37
+
38
+ ## Intended use
39
+
40
+ - Multispeaker TTS for Ewe using one of the training-set speaker voices.
41
+ - Research on multilingual TTS, low-resource TTS evaluation, and listening
42
+ studies on Open Bible–style read-speech.
43
+
44
+ ## How to use
45
+
46
+ Install EveryVoice:
47
+
48
+ ```bash
49
+ pip install everyvoice
50
+ ```
51
+
52
+ Download the checkpoint and run inference:
53
+
54
+ ```python
55
+ import torch
56
+ from pathlib import Path
57
+ from huggingface_hub import snapshot_download
58
+
59
+ from everyvoice.config.type_definitions import DatasetTextRepresentation
60
+ from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.synthesize import (
61
+ get_global_step,
62
+ synthesize_helper,
63
+ )
64
+ from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.model import FastSpeech2
65
+ from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.type_definitions import (
66
+ SynthesizeOutputFormats,
67
+ )
68
+ from everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.utils import (
69
+ load_hifigan_from_checkpoint,
70
+ )
71
+ from everyvoice.utils.heavy import get_device_from_accelerator
72
+
73
+ repo_id = "multilingual-tts/EveryVoice-OpenBible-Ewe"
74
+ local = Path(snapshot_download(repo_id))
75
+
76
+ ckpt_path = local / "feature_prediction.ckpt"
77
+ vocoder_path = local / "vocoder.ckpt"
78
+
79
+ accelerator = "gpu" if torch.cuda.is_available() else "cpu"
80
+ device = get_device_from_accelerator(accelerator)
81
+
82
+ model = FastSpeech2.load_from_checkpoint(str(ckpt_path)).to(device)
83
+ model.eval()
84
+ global_step = get_global_step(ckpt_path)
85
+
86
+ vocoder_ckpt = torch.load(str(vocoder_path), map_location=device, weights_only=True)
87
+ vocoder_model, vocoder_config = load_hifigan_from_checkpoint(vocoder_ckpt, device)
88
+ vocoder_global_step = get_global_step(vocoder_path)
89
+
90
+ # Pick any speaker from the model
91
+ speaker = next(iter(model.speaker2id.keys()))
92
+ language = next(iter(model.lang2id.keys()))
93
+ print(f"Available speakers: {list(model.speaker2id.keys())}")
94
+
95
+ filelist_data = [
96
+ {
97
+ "basename": "sample-0",
98
+ "characters": "...", # text to synthesise in Ewe
99
+ "language": language,
100
+ "speaker": speaker,
101
+ "duration_control": 1.0,
102
+ }
103
+ ]
104
+
105
+ output_dir = Path("everyvoice_output")
106
+ output_dir.mkdir(exist_ok=True)
107
+
108
+ synthesize_helper(
109
+ model=model,
110
+ texts=None,
111
+ style_reference=None,
112
+ language=None,
113
+ speaker=None,
114
+ duration_control=1.0,
115
+ global_step=global_step,
116
+ output_type=[SynthesizeOutputFormats.wav],
117
+ text_representation=DatasetTextRepresentation.characters,
118
+ accelerator=accelerator,
119
+ devices="auto",
120
+ device=device,
121
+ batch_size=1,
122
+ num_workers=1,
123
+ filelist=None,
124
+ filelist_data=filelist_data,
125
+ output_dir=output_dir,
126
+ teacher_forcing_directory=None,
127
+ vocoder_model=vocoder_model,
128
+ vocoder_config=vocoder_config,
129
+ vocoder_global_step=vocoder_global_step,
130
+ )
131
+ # Generated WAVs land in output_dir/wav/
132
+ ```
133
+
134
+ ## Training data
135
+
136
+ - **Source:** `davidguzmanr/open-bible-resources`, config `Ewe`
137
+ - **Size:** approximately 22,195 utterances
138
+ - **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
139
+ voices and selected by name at inference time
140
+ - **Sample rate:** 22,050 Hz
141
+
142
+ ## Training procedure
143
+
144
+ - Acoustic model: FastSpeech2 (non-autoregressive, duration-prediction based).
145
+ - Vocoder: HiFi-GAN (iSTFT variant).
146
+ - Character-level tokenizer built from the training transcripts.
147
+ - Trained with the [EveryVoice](https://github.com/EveryVoiceTTS/EveryVoice) toolkit.
148
+
149
+ Audio preprocessing and training are reproducible via the upstream
150
+ [open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.
151
+
152
+ ## Evaluation
153
+
154
+ Evaluated alongside other Open-Bible TTS systems on character/word error rate
155
+ (via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the
156
+ [open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
157
+ for the evaluation pipeline and the
158
+ [open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
159
+ for the human-listening survey methodology.