The model interrupts its response

by ArtemSultanov - opened 29 days ago

Hi! Has anyone encountered a problem when the model issues a block and stops responding further? I have this problem all the time in different agent systems. What could be the reason for this? How to fix it?

ArtemSultanov

29 days ago

Here are my start options. Maybe something needs to be changed?
--gpus '"device=4,5,6,7"' \
--runtime=nvidia \
-v /data:/data \
-p 8002:8000 \
--ipc=host \
vllm/vllm-openai:v0.17.1 \
--model=/data/models/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated \
--tensor-parallel-size 4 \
--max-model-len 100000 \
--gpu-memory-utilization 0.85 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_xml \
--dtype bfloat16 \
--kv-cache-dtype fp8 \
--max-num-seqs 10 \
--host 0.0.0.0 \
--port 8000"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment