Text-to-Speech
Assamese
coqui-tts
tts
vits
open-bible
assamese
File size: 3,596 Bytes
db83970
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
language:
  - as
license: cc-by-sa-4.0
library_name: coqui-tts
tags:
  - text-to-speech
  - tts
  - vits
  - open-bible
  - assamese
pipeline_tag: text-to-speech
datasets:
  - davidguzmanr/open-bible-resources
inference: false
---

# VITS Open Bible — Assamese

A multispeaker text-to-speech model for **Assamese**, trained from scratch on
the [Open Bible](https://huggingface.co/datasets/davidguzmanr/open-bible-resources)
corpus using the [VITS](https://arxiv.org/abs/2106.06103) architecture
(end-to-end TTS with adversarial learning, 22,050 Hz output) via the
[Coqui TTS](https://github.com/coqui-ai/TTS) framework.

Unlike zero-shot TTS models, VITS is conditioned on speaker embeddings learned
during training. A speaker name from the training set must be supplied at
inference time.

## Files

| File | Purpose |
|------|---------|
| `model_last.pth` | Trained model weights. |
| `config.json` | Coqui TTS model configuration. |
| `speakers.pth` | Speaker ID → embedding mapping. |

## Intended use

- Multispeaker TTS for Assamese using one of the training-set speaker voices.
- Research on multilingual TTS, low-resource TTS evaluation, and listening
  studies on Open Bible–style read-speech.

## How to use

Install Coqui TTS:

```bash
pip install TTS
```

Download the checkpoint and run inference:

```python
import torch
from huggingface_hub import hf_hub_download
from TTS.tts.utils.speakers import SpeakerManager
from TTS.utils.synthesizer import Synthesizer

repo_id  = "multilingual-tts/VITS-OpenBible-Assamese"
ckpt     = hf_hub_download(repo_id, "model_last.pth")
config   = hf_hub_download(repo_id, "config.json")
speakers = hf_hub_download(repo_id, "speakers.pth")

use_cuda = torch.cuda.is_available()
synthesizer = Synthesizer(
    tts_checkpoint=ckpt,
    tts_config_path=config,
    tts_speakers_file=speakers,
    use_cuda=use_cuda,
)

# Coqui's Synthesizer may not inject the speakers file into the model config
# automatically — restore the SpeakerManager manually when needed.
if synthesizer.tts_model.speaker_manager is None:
    synthesizer.tts_model.speaker_manager = SpeakerManager(
        speaker_id_file_path=speakers
    )

# List available speaker names
print(sorted(synthesizer.tts_model.speaker_manager.speaker_names))

wav = synthesizer.tts(
    text="...",          # text to synthesise in Assamese
    speaker_name="...",  # one of the speaker names printed above
    split_sentences=True,
)
```

## Training data

- **Source:** `davidguzmanr/open-bible-resources`, config `Assamese`
- **Size:** approximately 20,895 utterances
- **Speakers:** multispeaker; speaker identity is fixed to one of the training-set
  voices and selected by name at inference time
- **Sample rate:** 22,050 Hz

## Training procedure

- Architecture: VITS (Conditional Variational Autoencoder + adversarial training).
- Grapheme-level tokenizer, built from the training transcripts.
- Optimizer: AdamW, learning rate 2e-4.
- Training budget: 500,000 optimizer updates on 2 GPUs with mixed precision
  (bf16).

Audio preprocessing and training are reproducible via the upstream
[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repo.

## Evaluation

Evaluated alongside other Open-Bible TTS systems on character/word error rate
(via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the
[open-bible-models](https://github.com/davidguzmanr/open-bible-models) repository
for the evaluation pipeline and the
[open-bible-surveys](https://github.com/davidguzmanr/open-bible-surveys) repository
for the human-listening survey methodology.