Questions about finetuning in another language

by notakwithac - opened Mar 7

Mar 7

Amazing work on this finetune.
I am a student trying to finetune the model for my own regional language. Can I get data around how much data (duration of audio files) and in which language did you use transcription for?
Also, did you use the voice of a single speaker or multiple speakers?
Any insight on this process would help me a lot.
Thankyou.

bxod

Uzbek LLM Lab org Mar 9

Hello, thanks for reaching out.
We used single-speaker voice and mix of Synthetic & Public datasets in two stages:
(1) Synthetic data to teach model to map text to Uzbek speech (1 epoch of 50K samples of 10-30 second synthetic speech)
(2) Public data to teach model to speak like a human (4 epochs of 50K samples each of 10-30 second natural speech)

You will see signs of success/failure within 10% of the dataset I used (more or less depending on how close is your target language to English-Chinese)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment