Qwen3-TTS Hindi LoRA — Pipeline 1 (Language Adaptation)
LoRA adapter for Qwen3-TTS-0.6B finetuned on Hindi speech data using Apple Silicon (MLX). Adapts the base model to speak natural Hindi without requiring a reference audio at inference.
Training Details
| Base model | mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit |
| Pipeline | Language Adaptation (Pipeline 1) |
| LoRA rank | 8 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Language conditioning | lang_code: auto (nothink prefix — no dedicated Hindi token) |
| Dataset | ~1900 Hindi speech samples (IndicVoices-R, CC-BY-4.0) |
| Epochs | 10 |
| Effective batch size | 32 (batch=2, grad_accum=16) |
| Learning rate | 2e-5 (cosine schedule) |
| Hardware | Apple Silicon (MLX) |
| Final val loss | 7.55 |
| Adapter size | ~24 MB |
Usage
Requires mlx-audio and the training repo.
from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio
from train.lora import apply_lora, load_adapters, LoRAConfig
# Load base model
model = load_model("mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit")
# Apply and load LoRA adapter
apply_lora(model, LoRAConfig(model_type="qwen3_tts", rank=8))
load_adapters(model, "adapters.safetensors") # download from this repo
# Generate Hindi speech (lang_code="auto" — this adapter was trained without a dedicated Hindi token)
generate_audio(
text="नमस्ते! आज का दिन बहुत अच्छा है।",
model=model,
output_path="./output",
lang_code="auto",
)
Or use the demo UI from the training repo:
git clone https://github.com/akashicMarga/mlx-audio-train
cd mlx-audio-train
python scripts/demo.py
# Select this adapter from the dropdown, set language to "hi"
What This Adapter Does
The base Qwen3-TTS model was pretrained on 10 languages (English, Chinese, etc.) but not Hindi. This LoRA adapter teaches it Hindi pronunciation and prosody by:
- Assigning Hindi a dedicated language token ID (
hi → 2051) in the codec embedding table - Finetuning attention + MLP layers on Hindi speech via LoRA — only ~0.5% of parameters are updated
This adapter was trained with lang_code: auto (nothink prefix) — no dedicated Hindi token was used. The model learns Hindi from the audio data alone via LoRA weight updates. A future multilingual adapter (coming soon) will use a dedicated hi→2051 language token for stronger conditioning.
Dataset
- IndicVoices-R (ai4bharat/indicvoices_r) — CC-BY-4.0
- ~1900 training samples, ~100 validation samples
- Audio resampled to 24kHz, loudness-normalized to -23 dBFS, silence-trimmed
- Duration range: 1–14 seconds, SNR ≥ 20 dB
Training Code
Full training pipeline (data download, preprocessing, LoRA finetuning) available at: https://github.com/akashicMarga/mlx-audio-train
Supports any language — just provide a JSONL dataset and set lang_code in the config.
Limitations
- Trained on ~1900 samples — good for adaptation, not production-grade
- Voice identity is random at inference (use ref_audio for voice cloning)
- May hallucinate on very long or unusual text
- Trained and tested on Apple Silicon only
License
Apache 2.0. Dataset: IndicVoices-R (CC-BY-4.0) — please credit ai4bharat if redistributing.
Quantized
Model tree for akashicmarga/qwen3-tts-hindi-lora
Base model
mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit