speech to speech possible?
Hello Team, is it possible to do speech to speech using these models?
No, it's not possible. You have to build your own ASR -> LLM - Qwen3-TTS pipeline. You can see my reference implementation here: https://github.com/acatovic/ova. You just need to change TTS from Kokoro to Qwen3-TTS. I've been trying this model from NVIDIA: https://huggingface.co/nvidia/personaplex-7b-v1 (it's speech-to-speech) but haven't managed to get it running in a full application (I get "choppy" sound).
@krigeta yeah the duplex models are getting better but still not quite there. If you have the right hardware, you can try PersonalPlex, but also keep an eye on this issue: https://github.com/NVIDIA/personaplex/issues/3 since they are working on improving that on the Blackwell and 50xx cards.
@krigeta it's a speech-to-speech model, and fully duplex, meaning that you can technically interrupt it (it should feel "real"). But again, i haven't been able to get it to work due to chopiness.