Fine-tuning VyvoTTS-Qwen3-TR for domain-specific Turkish TTS

by tamerkanak - opened Mar 9

Mar 9

Hi,

I’ve been experimenting with VyvoTTS-Qwen3-TR and the demo outputs sound very promising for Turkish speech synthesis.

I’m interested in fine-tuning the model for a domain-specific Turkish TTS application (custom vocabulary and phrasing), but I couldn’t find clear training or fine-tuning instructions specifically for this model.

I have a few questions:

Is VyvoTTS-Qwen3-TR intended to be fine-tuned by users, or is it mainly provided as an inference model?
If fine-tuning is supported, what is the recommended workflow?
- Should we use the training pipeline from the VyvoTTS GitHub repository?
- Are there specific configs for Qwen3-based TTS models?
Is LoRA or PEFT-style fine-tuning supported for this model?
What dataset format is expected for Turkish fine-tuning?
- text + audio pairs
- SNAC tokens
- or another format?
Would you recommend fine-tuning:
- directly on VyvoTTS-Qwen3-TR, or
- starting from VyvoTTS-v0-Qwen3-0.6B and then adapting it to Turkish?

Any guidance or recommended training setup would be greatly appreciated.

Thanks for releasing the model!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment