Looping forever

#31
by kil3r - opened

I'm running it with vllm as instructed. I'm using the latest nightly vllm and loading this exact model (not some random quants). Unfortunately when running excessive benchmarks more often then not the generation loops forever and continues up until full context size.

Has anyone experienced similar problems? I'm running it comfortably on A100 with 80GB VRAM.

I'm running it with vllm as instructed. I'm using the latest nightly vllm and loading this exact model (not some random quants). Unfortunately when running excessive benchmarks more often then not the generation loops forever and continues up until full context size.

Has anyone experienced similar problems? I'm running it comfortably on A100 with 80GB VRAM.

Have you tried --tokenizer Qwen/Qwen3.5-27B

Sign up or log in to comment