🦜 VieNeu-TTS v2 Turbo (GPU Edition)
The fastest Bilingual (Vietnamese & English) TTS engine with Instant Zero-Shot Voice Cloning.
📖 Model Description
VieNeu-TTS v2 Turbo is the performance-tuned edition of the VieNeu-TTS family. Built on a transformer-based architecture and optimized for minimal latency, it delivers high-fidelity 24 kHz speech synthesis with Instant Voice Cloning capabilities.
This version is designed for GPU-accelerated inference (Standard/Transformers backend), making it ideal for real-time applications, interactive assistants, and creative content generation on platforms like Hugging Face Spaces (ZeroGPU).
✨ Key Features
- 🦜 Instant Voice Cloning: Clone any voice with just 3-5 seconds of reference audio. Truly zero-shot—no reference text required for v2 Turbo!
- 🇻🇳🇺🇸 Bilingual (Code-switching): Seamlessly handles mixed Vietnamese–English sentences in a single utterance.
- 🚀 Extreme Speed: Optimized architecture for ultra-low latency inference on GPUs.
- 🔇 AI Watermarking: Every audio output includes an imperceptible identifier for responsible AI content tracing.
- 🔊 24 kHz High-Fidelity: Studio-quality neural codec output.
🚀 Quickstart
Option 1 — Install via vieneu SDK (Recommended)
# Minimal installation (Turbo/CPU Only)
pip install vieneu
# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/
# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/
from vieneu import Vieneu
# Initialize in Turbo mode (Default - Minimal dependencies)
tts = Vieneu()
# 1. Simple synthesis (uses default Southern Male voice 'Xuân Vĩnh')
text = "Hệ thống điện chủ yếu sử dụng alternating current because it is more efficient."
audio = tts.infer(text=text)
# Save to file
tts.save(audio, "output_Xuân Vĩnh.wav")
print("💾 Saved to output_Xuân Vĩnh.wav")
# 2. Using a specific Preset Voice
voices = tts.list_preset_voices()
for desc, voice_id in voices:
print(f"Voice: {desc} (ID: {voice_id})")
my_voice_id = voices[1][1] if len(voices) > 1 else voices[0][1] # Giọng Phạm Tuyên
voice_data = tts.get_preset_voice(my_voice_id)
audio_custom = tts.infer(text="Tôi đang nói bằng giọng của Bác sĩ Tuyên.", voice=voice_data)
# 3. Save to file
tts.save(audio_custom, "output_Phạm Tuyên.wav")
print("💾 Saved to output_Phạm Tuyên.wav")
🦜 Zero-shot Voice Cloning (SDK)
Clone any voice with only 3-5 seconds of audio using the local Turbo engine:
from vieneu import Vieneu
tts = Vieneu() # Defaults to Turbo mode
# 1. Encode the reference audio (extracts speaker embedding)
# Supported formats: .wav, .mp3, .flac
my_voice = tts.encode_reference("examples/audio_ref/example.wav")
# 2. Synthesize with the cloned voice
# No reference text required for Turbo v2!
audio = tts.infer(
text="Đây là giọng nói được clone trực tiếp bằng SDK của VieNeu-TTS.",
voice=my_voice
)
tts.save(audio, "cloned_voice.wav")
Option 2 — Web UI (Full repo)
git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync # minimal install (Turbo/CPU)
uv run vieneu-web
# → Open http://127.0.0.1:7860
🔬 Model Architecture
VieNeu-TTS v2 Turbo utilizes a state-of-the-art two-stage pipeline:
- Transformer LLM Backbone: A decoder-only transformer that predicts discrete audio tokens from text and speaker embeddings.
- Neural Codec (VieNeu-Codec): A high-performance VQ-VAE decoder that converts tokens into a 24 kHz waveform with minimal artifacts.
📊 Training Data
Trained on a massive multi-speaker dataset comprising over 20,000 hours of high-quality speech:
| Dataset | Language | Description |
|---|---|---|
pnnbao-ump/VieNeu-TTS-1000h |
Vietnamese | DeepMind/Vietnamese studio-quality corpus |
pnnbao-ump/vietnamese-audio-corpus |
Vietnamese | Large-scale multi-accent Vietnamese data |
amphion/Emilia-Dataset |
Multilingual | Large-scale multilingual diverse speech |
facebook/multilingual_librispeech |
English | Extensive English read speech |
🗺️ Roadmap
- Turbo GPU (Transformers) Engine
- Bilingual (Vietnamese–English) Support
- Zero-shot Voice Cloning
- Mobile SDK (Android / iOS)
- Streaming API Integration
🤝 Support & Links
| Resource | Link |
|---|---|
| 🐙 GitHub | pnnbao97/VieNeu-TTS |
| 📖 Documentation | docs.vieneu.io |
| 📦 PyPI | pip install vieneu |
| 💬 Discord | Join here |
📄 License
Released under Apache License 2.0 — permissible for both personal and commercial use.
Made with ❤️ for the Vietnamese TTS community by @pnnbao97 and contributors.
- Downloads last month
- 1,014