The model interrupts its response

#2
by ArtemSultanov - opened

Hi! Has anyone encountered a problem when the model issues a block and stops responding further? I have this problem all the time in different agent systems. What could be the reason for this? How to fix it?

Here are my start options. Maybe something needs to be changed?
--gpus '"device=4,5,6,7"' \
--runtime=nvidia \
-v /data:/data \
-p 8002:8000 \
--ipc=host \
vllm/vllm-openai:v0.17.1 \
--model=/data/models/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated \
--tensor-parallel-size 4 \
--max-model-len 100000 \
--gpu-memory-utilization 0.85 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_xml \
--dtype bfloat16 \
--kv-cache-dtype fp8 \
--max-num-seqs 10 \
--host 0.0.0.0 \
--port 8000"

Sign up or log in to comment