--max-model-len 32768 seems a bit too small for agent use cases ?

by edwarddukewu - opened Mar 12

Discussion

edwarddukewu

Mar 12

For agent workloads, a max model length of 32768 seems a bit on the low side.

tclf90

QuantTrio org Mar 12

It doesn't have to be 32k....
The model supports up to 260k by defaults.

bastoker

Mar 12

•

edited Mar 12

For example, I run it at 131072 but that is because vllm otherwise goes Out Of Memory with my hardware. Just experiment how much you can stretch it, but 32k is indeed not enough for real stuff.

JunHowie

QuantTrio org Mar 13

This depends on how much VRAM your hardware has available to allocate for the KV cache.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment