inconsistent in selecting the speaker voice in sesame csm 1b

#48
by Jernik - opened

Hi guys, I tried generating output for speaker ID 0 at a time, but I'm not getting a consistent voice—the voice seems to change across generations. Did you experience the same issue? Were you able to get the same voice every time, or did it vary for you too?

Same. 🙏

You set the voice by creating 2 files that are read in every time you generate tokens. The 2 files are voice.wav and voice.txt as an example. Keep it to less than 10 seconds, and this will allow the model to base the voice off of the sample. txt must match what is spoken in the wav file.

You set the voice by creating 2 files that are read in every time you generate tokens. The 2 files are voice.wav and voice.txt as an example. Keep it to less than 10 seconds, and this will allow the model to base the voice off of the sample. txt must match what is spoken in the wav file.

Thanks! Long past csm. It was too wonky for our needs. We ended up implementing our own TTS engine. 🙏

Sign up or log in to comment