--max-model-len 32768 seems a bit too small for agent use cases ?

#3
by edwarddukewu - opened

For agent workloads, a max model length of 32768 seems a bit on the low side.

QuantTrio org

It doesn't have to be 32k....
The model supports up to 260k by defaults.

For example, I run it at 131072 but that is because vllm otherwise goes Out Of Memory with my hardware. Just experiment how much you can stretch it, but 32k is indeed not enough for real stuff.

QuantTrio org

This depends on how much VRAM your hardware has available to allocate for the KV cache.

Sign up or log in to comment