Add README for Ewe
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ inference: false
|
|
| 20 |
A multispeaker text-to-speech model for **Ewe**, trained from scratch on
|
| 21 |
the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
|
| 22 |
corpus using the [VITS](https://arxiv.org/abs/2106.06103) architecture
|
| 23 |
-
(end-to-end TTS with adversarial learning, 22
|
| 24 |
[Coqui TTS](https://github.com/coqui-ai/TTS) framework.
|
| 25 |
|
| 26 |
Unlike zero-shot TTS models, VITS is conditioned on speaker embeddings learned
|
|
@@ -93,14 +93,15 @@ wav = synthesizer.tts(
|
|
| 93 |
- **Size:** approximately 22,195 utterances
|
| 94 |
- **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
|
| 95 |
voices and selected by name at inference time
|
| 96 |
-
- **Sample rate:** 22
|
| 97 |
|
| 98 |
## Training procedure
|
| 99 |
|
| 100 |
- Architecture: VITS (Conditional Variational Autoencoder + adversarial training).
|
| 101 |
- Grapheme-level tokenizer, built from the training transcripts.
|
| 102 |
- Optimizer: AdamW, learning rate 2e-4.
|
| 103 |
-
- Training budget:
|
|
|
|
| 104 |
|
| 105 |
Audio preprocessing and training are reproducible via the upstream
|
| 106 |
[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.
|
|
|
|
| 20 |
A multispeaker text-to-speech model for **Ewe**, trained from scratch on
|
| 21 |
the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
|
| 22 |
corpus using the [VITS](https://arxiv.org/abs/2106.06103) architecture
|
| 23 |
+
(end-to-end TTS with adversarial learning, 22,050 Hz output) via the
|
| 24 |
[Coqui TTS](https://github.com/coqui-ai/TTS) framework.
|
| 25 |
|
| 26 |
Unlike zero-shot TTS models, VITS is conditioned on speaker embeddings learned
|
|
|
|
| 93 |
- **Size:** approximately 22,195 utterances
|
| 94 |
- **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
|
| 95 |
voices and selected by name at inference time
|
| 96 |
+
- **Sample rate:** 22,050 Hz
|
| 97 |
|
| 98 |
## Training procedure
|
| 99 |
|
| 100 |
- Architecture: VITS (Conditional Variational Autoencoder + adversarial training).
|
| 101 |
- Grapheme-level tokenizer, built from the training transcripts.
|
| 102 |
- Optimizer: AdamW, learning rate 2e-4.
|
| 103 |
+
- Training budget: 500,000 optimizer updates on 2 GPUs with mixed precision
|
| 104 |
+
(bf16).
|
| 105 |
|
| 106 |
Audio preprocessing and training are reproducible via the upstream
|
| 107 |
[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.
|