Model Speaks Too Fast and Lacks Pause Control

by grimavatar - opened 26 days ago

The audio cloning quality of Voxtral-4B-TTS-2603 is very strong, but there are key issues with speech timing and pacing.

The generated speech does not follow the tempo of the reference voice and tends to speak too quickly. Additionally, it does not preserve or reflect natural pauses from the reference audio, resulting in output that feels rushed and less natural.

There is currently no way to control or insert pauses in the generated speech, which makes it difficult to achieve more realistic pacing.

At a minimum, it would be helpful to support manual pause insertion within the input text to improve timing and overall clarity.

patrickvonplaten

Mistral AI_ org 25 days ago

Do you have an example prompt + voice?

Hnagy78

25 days ago

@wonderboy PLEASE I need to know does this model supports voice cloning feature locally? and is there any way to get it working on Windows?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment