Inference Speed Benchmark
#1
by Anilosan15 - opened
Hi everyone, do you have any speed or streaming benchmarks for the newly released models? I’m mainly looking for the time to first token, streaming speed in tokens per second, real time factor for audio or TTS if you have it, and how many parallel requests the server can handle reliably. If you can share your setup details like GPU type, quantization, batch size, or vLLM configuration, that would be super helpful. Thanks a lot!