Text-to-Speech
vllm
mistral-common

Other Languages than English / French in premade voices

#36
by markwitt1 - opened

I can only find english or french prebuilt voices.
If i use these voices with german text it has a heavy accent.
Sure, I can clone my own voice in german and it works, but the quality will never be the same as a first-party prebuilt voice.
Any recommendations here? any change we will get prebuilt voices in the other languages?

Mistral AI_ org
edited 13 days ago

We're working on a new set of voices that will include German and other European languages. In the meantime, we are updating our documentation on how to select a voice prompt for emulation. Here's a few guidelines to help you in the meantime:

  • Voice prompt needs to be chosen carefully:
    • Ideally longer than 3 seconds, up to 30 seconds should be good enough.
    • Must have only one speaker
    • No background noise, clean recording for best results.
    • Neutral prosody of speech: no excessive pausing/disfluencies such as umm/ahh,
    • Should have expressive pitch: flat voice samples lead to boring generations.
    • Ideally should be same language as the text prompt, but model should work on cross-lingual prompts too. For example, you could try voice prompt in language A (French), translated text in language B (English), and the generation would sound like French-accented English.
  • Text prompt
    • Ideally convert to verbalizable form for controllability (e.g. instead of “1234”, use “one thousand two thirty four” or “twelve hundred thirty four” to disambiguate
    • No use of rich formatting like markdown, emojis, etc.
    • For abbreviations, while model might work out of the box, better results likely obtained by changing “FBI” —> “F-B-I” or “F.B.I.”.
    • Prompts should be less than 300 words or lesser.
  • One text prompt per request. Multiple lines are not expected to work well

Sign up or log in to comment