Looping forever

#31

by kil3r - opened Mar 11

Mar 11

I'm running it with vllm as instructed. I'm using the latest nightly vllm and loading this exact model (not some random quants). Unfortunately when running excessive benchmarks more often then not the generation loops forever and continues up until full context size.

Has anyone experienced similar problems? I'm running it comfortably on A100 with 80GB VRAM.

Avesed

Mar 12

I'm running it with vllm as instructed. I'm using the latest nightly vllm and loading this exact model (not some random quants). Unfortunately when running excessive benchmarks more often then not the generation loops forever and continues up until full context size.

Has anyone experienced similar problems? I'm running it comfortably on A100 with 80GB VRAM.

Have you tried --tokenizer Qwen/Qwen3.5-27B

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment