You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This LoRA adapter is a Derivative Work of IndexTTS2 (as defined in bilibili's license Section 1.5) and is trained on vaja-thai which includes non-commercial data sources. Before accessing, you must agree to all terms in LICENSE.md.

Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

indextts2-thai-lora

Thai LoRA adapter for IndexTTS2 — a zero-shot voice cloning TTS model.

License

Non-commercial use only. This model is subject to multiple licenses:

Base model

Component	License	Key restriction
IndexTTS2	bilibili Model Use License	Derivatives limited to improving IndexTTS or non-commercial AI; entities with >100M MAU or >1B RMB revenue need separate license

Training data (vaja-thai)

Source	License	Commercial use
porjai_central	CC-BY-SA-4.0	Yes
CommonVoice	CC-0	Yes
TSync2	CC-BY-NC-SA-3.0	No
GigaSpeech2	Non-commercial research only	No

You must comply with ALL licenses above. The combined effect is: non-commercial use only, attribution required, share-alike for derivatives.

Model Details

Base model: IndexTTS2 (871M param GPT, 24 layers)
Method: LoRA fine-tuning on GPT attention + FFN layers
LoRA rank: 32
Trainable params: ~15.7M (1.76% of base model)
BPE vocab: 32000 (expanded from 12k with PyThaiNLP dictionary)
Training step: 299380 (epoch 9)
Validation loss: 3.6813

Training Data

Trained on vaja-thai — a quality-filtered Thai speech dataset (289k samples, 554h) with tier-based oversampling. TSync2 studio data (designed to cover all Thai phonemes) is included in full and oversampled for pronunciation clarity.

What's Included

File	Description
`adapter_model.safetensors`	LoRA adapter weights
`adapter_config.json`	PEFT/LoRA configuration
`extra_weights.pt`	Fine-tuned text_embedding + text_head
`bpe_thai.model`	Retrained BPE tokenizer (Thai + Chinese + English)

Usage

from src.inference_thai import load_thai_model

tts = load_thai_model(
    lora_path="path/to/this/model",
    bpe_model_path="path/to/bpe_thai.model",
)

tts.infer(
    spk_audio_prompt="speaker.wav",
    text="สวัสดีครับ วันนี้อากาศดีมาก",
    output_path="output.wav",
)

Audio Samples

Thai speech samples using zero-shot voice cloning from a single speaker reference clip. English and Chinese use the original IndexTTS2 model (LoRA disabled) with no quality degradation.

Reference speaker:

#	Text	Audio
1	สวัสดีครับ วันนี้อากาศดีมาก เราไปเดินเล่นที่สวนสาธารณะกันเถอะ
2	ปัญญาประดิษฐ์กำลังเปลี่ยนแปลงวิถีชีวิตและการทำงานของเรา
3	ขอบคุณมากครับที่ช่วยเหลือ ผมจะไม่มีวันลืมน้ำใจของคุณ
4	กรุงเทพมหานคร เป็นเมืองหลวงของประเทศไทย มีชื่อเสียงเรื่องวัดที่สวยงาม
5	อาหารไทยมีรสชาติอร่อยมาก โดยเฉพาะต้มยำกุ้งและผัดไทยที่โด่งดังไปทั่วโลก

Citation

If you use this model, please cite the base IndexTTS2 model and the vaja-thai dataset.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dubbing-ai/indextts2-thai-lora

Base model

IndexTeam/IndexTTS-2

Adapter

(1)

this model