Like the idea, doesn't seem to work well?

#7
by Gershwin69 - opened

I've written in -action- tags and -sound- tags, but the action tags are temperamental and often don't do much, and the sound tags appear to be totally ignored, never producing anything in the output. It also inserts random sounds, like audience laughing or character giggles where none were intended.

And setting a female voice is giving me a male voice... so overall, not super great?

Scenema AI org

This model is not a typical tts model so there is a some learning curve to prompting it well. People have been complaining that it doesn't really work. But the secret to getting it to work is tuning all the parameters right so they work in concert with the prompt. We'll try to post a prompting guide on our website in the coming days. We'll be releasing a ComfyUI version next week with a demo . If you want to get updates, please join our discord here https://discord.com/invite/xC5TSxTNPu.

You really cannot get male or female voice by simply putting "male" or something in the prompt. Prompting is everything.

I think it would help somewhat if the examples included in Gradio worked more consistently. (Or, if they didn't include content that doesn't work - the pauses in the "Male, mid 60s. Deep baritone with gravel" example haven't worked for me with several generations attempts.)

This is a fascinating model, and I've gotten some very interesting results with it, and I think it's worth using and worth continuing to develop!

Sign up or log in to comment