High-Density Serving on PersonaPlex-7B

#15

by ErenalpCet - opened Jan 23

Jan 23

While the full-duplex capabilities of PersonaPlex are impressive, the monolithic design seems to make standard GPU optimizations nearly impossible. In a production environment, you would typically aim for high-density serving (e.g., 50–100 users per card), but this architecture appears to limit us to a 1:1 user-to-GPU ratio. It feels like the model is optimized more for driving high-end hardware sales than for scalable deployment efficiency.

royrajarshi

NVIDIA org Jan 23

This model release allows users to experience the naturalness possible with full duplex speech architectures and offers a local and reproducible setup. We are exploring other production focused architectures optimized for high-density deployment. Thanks for sharing the feedback and stay tuned! Scaling speech to speech agents is indeed a critical task.

royrajarshi changed discussion status to closed Jan 23

ErenalpCet

Jan 24

•

edited Jan 24

I appreciate the answer, though it confirms my suspicion that this model isn't intended for real-world scaling. Thanks for being clear about that.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment