Vietnamese Parler-TTS Voice Design
Vietnamese instruction-guided text-to-speech built on top of Parler-TTS.
This model generates Vietnamese speech from:
text: the content to speakdescription: a natural-language voice instruction describing accent, gender, age, pitch, speed, loudness, or speaking style
Demo Idea
Example voice descriptions:
Giọng nữ trẻ miền Bắc, nói chậm rãi, giọng caoGiọng nam trưởng thành miền Nam, nói nhanh và rất toGiọng nữ miền Trung, nói chậm rãi với âm lượng nhỏ, giọng trầm
Model Details
- Model name:
thangquang09/parler-tts-vietnamese-v1-stage2 - Architecture: Parler-TTS
- Language: Vietnamese
- Use case: instruction TTS / voice design
- Model type: controllable text-to-speech
- Base family: Parler-TTS
- Adaptation: Vietnamese fine-tuning with Vietnamese voice descriptions
How to Use
Python
import torch
import soundfile as sf
from transformers import AutoTokenizer
from parler_tts import ParlerTTSForConditionalGeneration
repo_id = "thangquang09/parler-tts-vietnamese-v1-stage2"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained(repo_id).to(device)
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=False)
text = "Xin chào, hôm nay bạn có khoẻ không?"
description = "Giọng nữ trẻ miền Bắc, nói chậm rãi, giọng cao"
desc_tokens = tokenizer(description, return_tensors="pt").to(device)
text_tokens = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
generation = model.generate(
input_ids=desc_tokens.input_ids,
attention_mask=desc_tokens.attention_mask,
prompt_input_ids=text_tokens.input_ids,
prompt_attention_mask=text_tokens.attention_mask,
do_sample=True,
temperature=1.0,
)
audio = generation.cpu().float().numpy().squeeze()
sf.write("output.wav", audio, model.config.sampling_rate)
Recommended Inference Repo
For a cleaner inference workflow with CLI and Gradio app, use the GitHub repository:
GitHub: https://github.com/thangquang09/vietnamese-parlertts
That repository provides:
api.pyfor CLI and Python inferenceapp.pyfor Gradio web UI- simpler loading from Hugging Face or local checkpoints
Example CLI from the GitHub repo:
python api.py \
--text "Xin chào, hôm nay bạn có khoẻ không?" \
--description "Giọng nữ trẻ miền Bắc, nói chậm rãi, giọng cao" \
--hf-repo thangquang09/parler-tts-vietnamese-v1-stage2 \
--output output.wav
Input Format
This model expects two text inputs:
- Speech text: the Vietnamese sentence to synthesize
- Voice description: the natural-language instruction describing how the voice should sound
Good descriptions usually mention some of:
- gender
- age
- accent or region
- speaking rate
- pitch
- loudness
- expressiveness or emotion
Intended Use
This model is intended for:
- Vietnamese TTS research
- controllable speech generation
- voice style prompting
- demo systems and rapid prototyping
Limitations
- Voice control is prompt-based, so instruction following may vary by prompt quality
- Some accents, ages, or speaking styles may be stronger than others
- Performance may differ across text length and prompting style
- This is not a voice cloning model
Training Notes
This is a Vietnamese adaptation of Parler-TTS released for inference and downstream use.
If you want the cleaned inference-first codebase, scripts, and app, please use the GitHub repo:
https://github.com/thangquang09/vietnamese-parlertts
Citation
If you use this model, please cite the original Parler-TTS work:
@misc{lacombe-etal-2024-parler-tts,
author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
title = {Parler-TTS},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/parler-tts}}
}
@misc{lyth2024natural,
title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
author={Dan Lyth and Simon King},
year={2024},
eprint={2402.01912},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
Acknowledgements
- Parler-TTS
- Hugging Face
- Vietnamese adaptation and inference packaging by thangquang09
- Downloads last month
- 111
Model tree for thangquang09/parler-tts-vietnamese-v1-stage2
Base model
parler-tts/parler-tts-mini-v1