You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This LoRA adapter is a Derivative Work of IndexTTS2 (as defined in bilibili's license Section 1.5) and is trained on vaja-thai which includes non-commercial data sources. Before accessing, you must agree to all terms in LICENSE.md.

Log in or Sign Up to review the conditions and access this model content.

Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

indextts2-thai-lora

Thai LoRA adapter for IndexTTS2 — a zero-shot voice cloning TTS model.

License

Non-commercial use only. This model is subject to multiple licenses:

Base model

Component License Key restriction
IndexTTS2 bilibili Model Use License Derivatives limited to improving IndexTTS or non-commercial AI; entities with >100M MAU or >1B RMB revenue need separate license

Training data (vaja-thai)

Source License Commercial use
porjai_central CC-BY-SA-4.0 Yes
CommonVoice CC-0 Yes
TSync2 CC-BY-NC-SA-3.0 No
GigaSpeech2 Non-commercial research only No

You must comply with ALL licenses above. The combined effect is: non-commercial use only, attribution required, share-alike for derivatives.

Model Details

  • Base model: IndexTTS2 (871M param GPT, 24 layers)
  • Method: LoRA fine-tuning on GPT attention + FFN layers
  • LoRA rank: 32
  • Trainable params: ~15.7M (1.76% of base model)
  • BPE vocab: 32000 (expanded from 12k with PyThaiNLP dictionary)
  • Training step: 299380 (epoch 9)
  • Validation loss: 3.6813

Training Data

Trained on vaja-thai — a quality-filtered Thai speech dataset (289k samples, 554h) with tier-based oversampling. TSync2 studio data (designed to cover all Thai phonemes) is included in full and oversampled for pronunciation clarity.

What's Included

File Description
adapter_model.safetensors LoRA adapter weights
adapter_config.json PEFT/LoRA configuration
extra_weights.pt Fine-tuned text_embedding + text_head
bpe_thai.model Retrained BPE tokenizer (Thai + Chinese + English)

Usage

from src.inference_thai import load_thai_model

tts = load_thai_model(
    lora_path="path/to/this/model",
    bpe_model_path="path/to/bpe_thai.model",
)

tts.infer(
    spk_audio_prompt="speaker.wav",
    text="สวัสดีครับ วันนี้อากาศดีมาก",
    output_path="output.wav",
)

Audio Samples

Thai speech samples using zero-shot voice cloning from a single speaker reference clip. English and Chinese use the original IndexTTS2 model (LoRA disabled) with no quality degradation.

Reference speaker:

# Text Audio
1 สวัสดีครับ วันนี้อากาศดีมาก เราไปเดินเล่นที่สวนสาธารณะกันเถอะ
2 ปัญญาประดิษฐ์กำลังเปลี่ยนแปลงวิถีชีวิตและการทำงานของเรา
3 ขอบคุณมากครับที่ช่วยเหลือ ผมจะไม่มีวันลืมน้ำใจของคุณ
4 กรุงเทพมหานคร เป็นเมืองหลวงของประเทศไทย มีชื่อเสียงเรื่องวัดที่สวยงาม
5 อาหารไทยมีรสชาติอร่อยมาก โดยเฉพาะต้มยำกุ้งและผัดไทยที่โด่งดังไปทั่วโลก

Citation

If you use this model, please cite the base IndexTTS2 model and the vaja-thai dataset.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dubbing-ai/indextts2-thai-lora

Adapter
(1)
this model