Qwen3-TTS Hindi LoRA — Pipeline 1 (Language Adaptation)

LoRA adapter for Qwen3-TTS-0.6B finetuned on Hindi speech data using Apple Silicon (MLX). Adapts the base model to speak natural Hindi without requiring a reference audio at inference.

Training Details


Base model	`mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit`
Pipeline	Language Adaptation (Pipeline 1)
LoRA rank	8
LoRA alpha	16
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Language conditioning	`lang_code: auto` (nothink prefix — no dedicated Hindi token)
Dataset	~1900 Hindi speech samples (IndicVoices-R, CC-BY-4.0)
Epochs	10
Effective batch size	32 (batch=2, grad_accum=16)
Learning rate	2e-5 (cosine schedule)
Hardware	Apple Silicon (MLX)
Final val loss	7.55
Adapter size	~24 MB

Usage

Requires mlx-audio and the training repo.

from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio
from train.lora import apply_lora, load_adapters, LoRAConfig

# Load base model
model = load_model("mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit")

# Apply and load LoRA adapter
apply_lora(model, LoRAConfig(model_type="qwen3_tts", rank=8))
load_adapters(model, "adapters.safetensors")  # download from this repo

# Generate Hindi speech (lang_code="auto" — this adapter was trained without a dedicated Hindi token)
generate_audio(
    text="नमस्ते! आज का दिन बहुत अच्छा है।",
    model=model,
    output_path="./output",
    lang_code="auto",
)

Or use the demo UI from the training repo:

git clone https://github.com/akashicMarga/mlx-audio-train
cd mlx-audio-train
python scripts/demo.py
# Select this adapter from the dropdown, set language to "hi"

What This Adapter Does

The base Qwen3-TTS model was pretrained on 10 languages (English, Chinese, etc.) but not Hindi. This LoRA adapter teaches it Hindi pronunciation and prosody by:

Assigning Hindi a dedicated language token ID (hi → 2051) in the codec embedding table
Finetuning attention + MLP layers on Hindi speech via LoRA — only ~0.5% of parameters are updated

This adapter was trained with lang_code: auto (nothink prefix) — no dedicated Hindi token was used. The model learns Hindi from the audio data alone via LoRA weight updates. A future multilingual adapter (coming soon) will use a dedicated hi→2051 language token for stronger conditioning.

Dataset

IndicVoices-R (ai4bharat/indicvoices_r) — CC-BY-4.0
~1900 training samples, ~100 validation samples
Audio resampled to 24kHz, loudness-normalized to -23 dBFS, silence-trimmed
Duration range: 1–14 seconds, SNR ≥ 20 dB

Training Code

Full training pipeline (data download, preprocessing, LoRA finetuning) available at: https://github.com/akashicMarga/mlx-audio-train

Supports any language — just provide a JSONL dataset and set lang_code in the config.

Limitations

Trained on ~1900 samples — good for adaptation, not production-grade
Voice identity is random at inference (use ref_audio for voice cloning)
May hallucinate on very long or unusual text
Trained and tested on Apple Silicon only

License

Apache 2.0. Dataset: IndicVoices-R (CC-BY-4.0) — please credit ai4bharat if redistributing.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for akashicmarga/qwen3-tts-hindi-lora

Base model

mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit

Adapter

(1)

this model