--max-model-len 32768 seems a bit too small for agent use cases ?
#3
by edwarddukewu - opened
For agent workloads, a max model length of 32768 seems a bit on the low side.
It doesn't have to be 32k....
The model supports up to 260k by defaults.
For example, I run it at 131072 but that is because vllm otherwise goes Out Of Memory with my hardware. Just experiment how much you can stretch it, but 32k is indeed not enough for real stuff.
This depends on how much VRAM your hardware has available to allocate for the KV cache.