🦜 VieNeu-TTS v2 Turbo β€” GGUF

Ultra-fast Vietnamese & English TTS β€” runs entirely on CPU, no GPU required.

Apache 2.0 VieNeu GitHub Discord


πŸ“– Model Description

VieNeu-TTS v2 Turbo is the lightweight, CPU-optimized edition of the VieNeu-TTS family β€” a state-of-the-art Vietnamese Text-to-Speech system. Quantized to GGUF format and paired with an ONNX neural codec, this model delivers near-real-time speech synthesis on commodity hardware: laptops, edge devices, and even Raspberry Pi class machines.

This repository hosts the GGUF quantized weights intended for use with llama-cpp-python as the inference backend, alongside the companion ONNX codec for waveform generation.

What makes it special?

  • πŸ‡»πŸ‡³πŸ‡ΊπŸ‡Έ Bilingual (Code-switching): Naturally handles mixed Vietnamese–English sentences, powered by sea-g2p. No need to pre-label language boundaries.
  • ⚑ Extreme Speed: Optimized GGUF quantization achieves real-time or faster inference on a standard CPU.
  • πŸ’» Zero GPU Dependency: Runs fully offline on any x86_64 / ARM64 machine with sufficient RAM.
  • πŸ”‡ AI Watermarking: Audio output embeds an imperceptible identifier for responsible AI content tracing.
  • πŸ”Š 24 kHz Audio: High-fidelity waveform output suitable for production applications.

πŸ—‚οΈ Repository Contents

File Description
vieneu-v2-turbo-*.gguf GGUF quantized LLM backbone (multiple quant levels)

πŸš€ Quickstart

Option 1 β€” Install via vieneu SDK (Recommended)

# Minimal installation (Turbo/CPU Only)
pip install vieneu

# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/
from vieneu import Vieneu

# Turbo mode is the default β€” no GPU needed
tts = Vieneu()

# Vietnamese only
audio = tts.infer(text="Xin chΓ o! ĐÒy lΓ  VieNeu TTS phiΓͺn bαΊ£n Turbo.")
tts.save(audio, "output.wav")

# Bilingual code-switching
audio = tts.infer(
    text="TrΖ°α»›c Δ‘Γ’y, hệ thα»‘ng Δ‘iện sα»­ dα»₯ng direct current, nhΖ°ng Tesla Δ‘Γ£ chα»©ng minh alternating current is more efficient."
)
tts.save(audio, "output_bilingual.wav")

Option 2 β€” Web UI (Full repo)

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync          # minimal install (Turbo/CPU)
uv run vieneu-web
# β†’ Open http://127.0.0.1:7860

🌐 Bilingual Code-Switching

VieNeu-TTS v2 Turbo can handle natural Vietnamese–English mixed text without any special markup. The sea-g2p engine automatically identifies language boundaries and generates accurate phonemes for both languages.

from vieneu import Vieneu

tts = Vieneu()

examples = [
    "Hôm nay tôi sẽ trình bày về machine learning và deep learning.",
    "The new feature lΓ  rαΊ₯t hα»―u Γ­ch cho developers.",
    "VieNeu supports both Vietnamese vΓ  English seamlessly.",
]

for i, text in enumerate(examples):
    audio = tts.infer(text=text)
    tts.save(audio, f"bilingual_{i}.wav")

πŸŽ™οΈ Preset Voices

The model ships with multiple preset voices. List and use them via the SDK:

from vieneu import Vieneu

tts = Vieneu()

# List available preset voices
voices = tts.list_preset_voices()
for description, voice_id in voices:
    print(f"  {description} β†’ ID: {voice_id}")

# Use a specific voice
voice_data = tts.get_preset_voice("xuan_vinh")   # default: Southern Male
audio = tts.infer(
    text="Giọng đọc nΓ y được tα»•ng hợp bởi VieNeu Turbo.",
    voice=voice_data
)
tts.save(audio, "preset_voice.wav")

Note: Instant Voice Cloning is not yet available in Turbo mode. It is planned for a future release. For cloning, use the standard GPU-based VieNeu-TTS-v2 model.


πŸ”¬ Model Architecture

VieNeu-TTS v2 Turbo is a two-stage TTS system:

  1. LLM Backbone (GGUF): A transformer language model conditioned on text tokens and speaker embeddings. It predicts discrete audio codec tokens autoregressively.
  2. Neural Codec (ONNX): A VQ-VAE-based neural codec (VieNeu-Codec) decodes the predicted token sequence into a 24 kHz waveform.

The bilingual capability is enabled by sea-g2p, which converts mixed-language graphemes to phonemes before the LLM backbone processes them.


πŸ“Š Training Data

The model was trained on over 20,000 hours of combined Vietnamese and English speech data, covering a wide range of speakers, accents, recording conditions, and speaking styles.

Dataset Language Description
pnnbao-ump/VieNeu-TTS-1000h Vietnamese Curated studio-quality Vietnamese speech corpus
pnnbao-ump/vietnamese-audio-corpus Vietnamese Diverse multi-speaker Vietnamese audio
amphion/Emilia-Dataset Multilingual Large-scale multilingual speech dataset
facebook/multilingual_librispeech English + others Multilingual read speech

πŸ—ΊοΈ Roadmap

  • GGUF/ONNX Turbo engine
  • Bilingual (Vietnamese–English) code-switching
  • Turbo Voice Cloning
  • Mobile SDK (Android / iOS)
  • Streaming output API

🀝 Related Resources

Resource Link
πŸ“¦ PyPI Package pip install vieneu
πŸ™ GitHub pnnbao97/VieNeu-TTS
πŸ“– Documentation docs.vieneu.io
πŸ€— Full Model (GPU) pnnbao-ump/VieNeu-TTS
πŸ’¬ Discord Community Join here
β˜• Support the project buymeacoffee.com/pnnbao

πŸ“„ License

This model is released under the Apache License 2.0 β€” free for personal and commercial use.


Made with ❀️ for the Vietnamese TTS community by @pnnbao97 and contributors.

Downloads last month
120
GGUF
Model size
0.1B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train phucpx247/vieneu-tts-v2-turbo-gguf