speech to speech possible?

by krigeta - opened Jan 23

Discussion

krigeta

Jan 23

Hello Team, is it possible to do speech to speech using these models?

acatovic

Jan 23

No, it's not possible. You have to build your own ASR -> LLM - Qwen3-TTS pipeline. You can see my reference implementation here: https://github.com/acatovic/ova. You just need to change TTS from Kokoro to Qwen3-TTS. I've been trying this model from NVIDIA: https://huggingface.co/nvidia/personaplex-7b-v1 (it's speech-to-speech) but haven't managed to get it running in a full application (I get "choppy" sound).

krigeta

Jan 24

Hey @acatovic this is great! my goal is not to use a specific model but use a model that can able to give me promising results. so I guess RVC models are still best for speech to speech? please let me know you review on this and I also wants to know if is worth trying to use personaplex-7b.

acatovic

Jan 24

@krigeta yeah the duplex models are getting better but still not quite there. If you have the right hardware, you can try PersonalPlex, but also keep an eye on this issue: https://github.com/NVIDIA/personaplex/issues/3 since they are working on improving that on the Blackwell and 50xx cards.

krigeta

Jan 24

@acatovic this model is a conversation model and not the speech to speech model I guess, like I can not convert someone speech in my voice, do i?

acatovic

Jan 24

•

edited Jan 24

@krigeta it's a speech-to-speech model, and fully duplex, meaning that you can technically interrupt it (it should feel "real"). But again, i haven't been able to get it to work due to chopiness.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment