SGLang very slow ~6 toks with 1 concurrency on H100SXM
#3
by RonanMcGovern - opened
I'm using SGLang latest docker image (latest tag).
Same issue with Qwen 32B dense.
The issue was that I was not counting reasoning tokens, as they are returned in a separate field.
RonanMcGovern changed discussion status to closed