You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This LoRA adapter is a Derivative Work of IndexTTS2 (as defined in bilibili's license Section 1.5) and is trained on vaja-thai which includes non-commercial data sources. Before accessing, you must agree to all terms in LICENSE.md.
Log in or Sign Up to review the conditions and access this model content.
Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
indextts2-thai-lora
Thai LoRA adapter for IndexTTS2 — a zero-shot voice cloning TTS model.
License
Non-commercial use only. This model is subject to multiple licenses:
Base model
| Component | License | Key restriction |
|---|---|---|
| IndexTTS2 | bilibili Model Use License | Derivatives limited to improving IndexTTS or non-commercial AI; entities with >100M MAU or >1B RMB revenue need separate license |
Training data (vaja-thai)
| Source | License | Commercial use |
|---|---|---|
| porjai_central | CC-BY-SA-4.0 | Yes |
| CommonVoice | CC-0 | Yes |
| TSync2 | CC-BY-NC-SA-3.0 | No |
| GigaSpeech2 | Non-commercial research only | No |
You must comply with ALL licenses above. The combined effect is: non-commercial use only, attribution required, share-alike for derivatives.
Model Details
- Base model: IndexTTS2 (871M param GPT, 24 layers)
- Method: LoRA fine-tuning on GPT attention + FFN layers
- LoRA rank: 32
- Trainable params: ~15.7M (1.76% of base model)
- BPE vocab: 32000 (expanded from 12k with PyThaiNLP dictionary)
- Training step: 299380 (epoch 9)
- Validation loss: 3.6813
Training Data
Trained on vaja-thai — a quality-filtered Thai speech dataset (289k samples, 554h) with tier-based oversampling. TSync2 studio data (designed to cover all Thai phonemes) is included in full and oversampled for pronunciation clarity.
What's Included
| File | Description |
|---|---|
adapter_model.safetensors |
LoRA adapter weights |
adapter_config.json |
PEFT/LoRA configuration |
extra_weights.pt |
Fine-tuned text_embedding + text_head |
bpe_thai.model |
Retrained BPE tokenizer (Thai + Chinese + English) |
Usage
from src.inference_thai import load_thai_model
tts = load_thai_model(
lora_path="path/to/this/model",
bpe_model_path="path/to/bpe_thai.model",
)
tts.infer(
spk_audio_prompt="speaker.wav",
text="สวัสดีครับ วันนี้อากาศดีมาก",
output_path="output.wav",
)
Audio Samples
Thai speech samples using zero-shot voice cloning from a single speaker reference clip. English and Chinese use the original IndexTTS2 model (LoRA disabled) with no quality degradation.
Reference speaker:
| # | Text | Audio |
|---|---|---|
| 1 | สวัสดีครับ วันนี้อากาศดีมาก เราไปเดินเล่นที่สวนสาธารณะกันเถอะ | |
| 2 | ปัญญาประดิษฐ์กำลังเปลี่ยนแปลงวิถีชีวิตและการทำงานของเรา | |
| 3 | ขอบคุณมากครับที่ช่วยเหลือ ผมจะไม่มีวันลืมน้ำใจของคุณ | |
| 4 | กรุงเทพมหานคร เป็นเมืองหลวงของประเทศไทย มีชื่อเสียงเรื่องวัดที่สวยงาม | |
| 5 | อาหารไทยมีรสชาติอร่อยมาก โดยเฉพาะต้มยำกุ้งและผัดไทยที่โด่งดังไปทั่วโลก |
Citation
If you use this model, please cite the base IndexTTS2 model and the vaja-thai dataset.
- Downloads last month
- -
Model tree for dubbing-ai/indextts2-thai-lora
Base model
IndexTeam/IndexTTS-2