speech to speech possible?

#1
by krigeta - opened

Hello Team, is it possible to do speech to speech using these models?

No, it's not possible. You have to build your own ASR -> LLM - Qwen3-TTS pipeline. You can see my reference implementation here: https://github.com/acatovic/ova. You just need to change TTS from Kokoro to Qwen3-TTS. I've been trying this model from NVIDIA: https://huggingface.co/nvidia/personaplex-7b-v1 (it's speech-to-speech) but haven't managed to get it running in a full application (I get "choppy" sound).

Hey @acatovic this is great! my goal is not to use a specific model but use a model that can able to give me promising results. so I guess RVC models are still best for speech to speech? please let me know you review on this and I also wants to know if is worth trying to use personaplex-7b.

@krigeta yeah the duplex models are getting better but still not quite there. If you have the right hardware, you can try PersonalPlex, but also keep an eye on this issue: https://github.com/NVIDIA/personaplex/issues/3 since they are working on improving that on the Blackwell and 50xx cards.

@acatovic this model is a conversation model and not the speech to speech model I guess, like I can not convert someone speech in my voice, do i?

@krigeta it's a speech-to-speech model, and fully duplex, meaning that you can technically interrupt it (it should feel "real"). But again, i haven't been able to get it to work due to chopiness.

Sign up or log in to comment