High-Density Serving on PersonaPlex-7B
While the full-duplex capabilities of PersonaPlex are impressive, the monolithic design seems to make standard GPU optimizations nearly impossible. In a production environment, you would typically aim for high-density serving (e.g., 50–100 users per card), but this architecture appears to limit us to a 1:1 user-to-GPU ratio. It feels like the model is optimized more for driving high-end hardware sales than for scalable deployment efficiency.
This model release allows users to experience the naturalness possible with full duplex speech architectures and offers a local and reproducible setup. We are exploring other production focused architectures optimized for high-density deployment. Thanks for sharing the feedback and stay tuned! Scaling speech to speech agents is indeed a critical task.
I appreciate the answer, though it confirms my suspicion that this model isn't intended for real-world scaling. Thanks for being clear about that.