Qwen3-TTS Hindi LoRA — Pipeline 1 (Language Adaptation)

LoRA adapter for Qwen3-TTS-0.6B finetuned on Hindi speech data using Apple Silicon (MLX). Adapts the base model to speak natural Hindi without requiring a reference audio at inference.

Training Details

Base model mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit
Pipeline Language Adaptation (Pipeline 1)
LoRA rank 8
LoRA alpha 16
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Language conditioning lang_code: auto (nothink prefix — no dedicated Hindi token)
Dataset ~1900 Hindi speech samples (IndicVoices-R, CC-BY-4.0)
Epochs 10
Effective batch size 32 (batch=2, grad_accum=16)
Learning rate 2e-5 (cosine schedule)
Hardware Apple Silicon (MLX)
Final val loss 7.55
Adapter size ~24 MB

Usage

Requires mlx-audio and the training repo.

from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio
from train.lora import apply_lora, load_adapters, LoRAConfig

# Load base model
model = load_model("mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit")

# Apply and load LoRA adapter
apply_lora(model, LoRAConfig(model_type="qwen3_tts", rank=8))
load_adapters(model, "adapters.safetensors")  # download from this repo

# Generate Hindi speech (lang_code="auto" — this adapter was trained without a dedicated Hindi token)
generate_audio(
    text="नमस्ते! आज का दिन बहुत अच्छा है।",
    model=model,
    output_path="./output",
    lang_code="auto",
)

Or use the demo UI from the training repo:

git clone https://github.com/akashicMarga/mlx-audio-train
cd mlx-audio-train
python scripts/demo.py
# Select this adapter from the dropdown, set language to "hi"

What This Adapter Does

The base Qwen3-TTS model was pretrained on 10 languages (English, Chinese, etc.) but not Hindi. This LoRA adapter teaches it Hindi pronunciation and prosody by:

  1. Assigning Hindi a dedicated language token ID (hi → 2051) in the codec embedding table
  2. Finetuning attention + MLP layers on Hindi speech via LoRA — only ~0.5% of parameters are updated

This adapter was trained with lang_code: auto (nothink prefix) — no dedicated Hindi token was used. The model learns Hindi from the audio data alone via LoRA weight updates. A future multilingual adapter (coming soon) will use a dedicated hi→2051 language token for stronger conditioning.

Dataset

  • IndicVoices-R (ai4bharat/indicvoices_r) — CC-BY-4.0
  • ~1900 training samples, ~100 validation samples
  • Audio resampled to 24kHz, loudness-normalized to -23 dBFS, silence-trimmed
  • Duration range: 1–14 seconds, SNR ≥ 20 dB

Training Code

Full training pipeline (data download, preprocessing, LoRA finetuning) available at: https://github.com/akashicMarga/mlx-audio-train

Supports any language — just provide a JSONL dataset and set lang_code in the config.

Limitations

  • Trained on ~1900 samples — good for adaptation, not production-grade
  • Voice identity is random at inference (use ref_audio for voice cloning)
  • May hallucinate on very long or unusual text
  • Trained and tested on Apple Silicon only

License

Apache 2.0. Dataset: IndicVoices-R (CC-BY-4.0) — please credit ai4bharat if redistributing.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akashicmarga/qwen3-tts-hindi-lora

Adapter
(1)
this model