Inference Speed Benchmark

by Anilosan15 - opened Jan 22

Jan 22

Hi everyone, do you have any speed or streaming benchmarks for the newly released models? I’m mainly looking for the time to first token, streaming speed in tokens per second, real time factor for audio or TTS if you have it, and how many parallel requests the server can handle reliably. If you can share your setup details like GPU type, quantization, batch size, or vLLM configuration, that would be super helpful. Thanks a lot!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment