Fine-tuning VyvoTTS-Qwen3-TR for domain-specific Turkish TTS
Hi,
I’ve been experimenting with VyvoTTS-Qwen3-TR and the demo outputs sound very promising for Turkish speech synthesis.
I’m interested in fine-tuning the model for a domain-specific Turkish TTS application (custom vocabulary and phrasing), but I couldn’t find clear training or fine-tuning instructions specifically for this model.
I have a few questions:
Is VyvoTTS-Qwen3-TR intended to be fine-tuned by users, or is it mainly provided as an inference model?
If fine-tuning is supported, what is the recommended workflow?
- Should we use the training pipeline from the VyvoTTS GitHub repository?
- Are there specific configs for Qwen3-based TTS models?
Is LoRA or PEFT-style fine-tuning supported for this model?
What dataset format is expected for Turkish fine-tuning?
- text + audio pairs
- SNAC tokens
- or another format?
Would you recommend fine-tuning:
- directly on
VyvoTTS-Qwen3-TR, or - starting from
VyvoTTS-v0-Qwen3-0.6Band then adapting it to Turkish?
- directly on
Any guidance or recommended training setup would be greatly appreciated.
Thanks for releasing the model!