Audio reference doesn't follow text and meta-data

#3
by yayite - opened

For breathing, dynamic, and nsfw lines with the reference audio provided it outputs long "aaaaaa" "oooooo" sounds, instead of (breath) {text}
Input:
15 second japanese audio .wav file

Outputs:

  • Speaks very slowly
  • "aaaaaaaaaaa" or "oooooooo" for 10 seconds
  • Meta-data - like emotion,style, doesn't affect output

I noticed this too, there is a lot of improvement needed here.

Sign up or log in to comment