Integration with Open WebUI ... thinking formatting

by flat-line - opened Mar 16

Mar 16

Hi,

congrats for the excellent tuning of the model. I'm checking the integration with Open WebUI and the thinking section formatting seems to be off:

this is my invoking cmd:

${LLAMA}/build/bin/llama-server
-m ${MODELS}/Qwen3.5-Opus-4B-Q8_0.gguf
--jinja
--mmproj ${MODELS}/mmproj/mmproj-Qwen3.5-Opus-4B-BF16.gguf
-t 4
-b 256
-ngl 999
-n 4096
-np 1
--host 0.0.0.0
--port ${PORT}
--ctx-size 32000
--temp 0.6
--top-p 0.95
--top-k 20
--min-p 0.00
--alias "unsloth/Qwen3.5-4B-GGUF"
--timeout 120
--poll 0
--chat-template-kwargs "{"enable_thinking":${ENABLE_THINKING}}"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment