Inference latency
#24
by popkek00 - opened
Good morning, this is models are fantastic i'm using the Base one to clone voices. I really like the performance of the model but i'm having some problems with the inference latencey, i think that it is too slow compared to otehr models that i used. I'm using an A4000 nvidia gpu 16GB ram. My goal is to us the model with pipecat framework and actually its working but it is too slow. Can you give me any advice about how i have to setup the model, i'm serving teh model with VLLM-OMNI so i think i'm doing somthing wrong during the start up. I'd really appreciate if you help me with this problem.
Thank you to all of you.