Audio reference doesn't follow text and meta-data
#3
by yayite - opened
For breathing, dynamic, and nsfw lines with the reference audio provided it outputs long "aaaaaa" "oooooo" sounds, instead of (breath) {text}
Input:
15 second japanese audio .wav file
Outputs:
- Speaks very slowly
- "aaaaaaaaaaa" or "oooooooo" for 10 seconds
- Meta-data - like emotion,style, doesn't affect output
I noticed this too, there is a lot of improvement needed here.