Fine-tuning VyvoTTS-Qwen3-TR for domain-specific Turkish TTS

#1
by tamerkanak - opened

Hi,

I’ve been experimenting with VyvoTTS-Qwen3-TR and the demo outputs sound very promising for Turkish speech synthesis.

I’m interested in fine-tuning the model for a domain-specific Turkish TTS application (custom vocabulary and phrasing), but I couldn’t find clear training or fine-tuning instructions specifically for this model.

I have a few questions:

  1. Is VyvoTTS-Qwen3-TR intended to be fine-tuned by users, or is it mainly provided as an inference model?

  2. If fine-tuning is supported, what is the recommended workflow?

    • Should we use the training pipeline from the VyvoTTS GitHub repository?
    • Are there specific configs for Qwen3-based TTS models?
  3. Is LoRA or PEFT-style fine-tuning supported for this model?

  4. What dataset format is expected for Turkish fine-tuning?

    • text + audio pairs
    • SNAC tokens
    • or another format?
  5. Would you recommend fine-tuning:

    • directly on VyvoTTS-Qwen3-TR, or
    • starting from VyvoTTS-v0-Qwen3-0.6B and then adapting it to Turkish?

Any guidance or recommended training setup would be greatly appreciated.

Thanks for releasing the model!

Sign up or log in to comment