hello,The model's output is garbled.

#1
by zhao198300 - opened

屏幕截图_20260524_033122


Did I do something wrong?

vllm run command :
VLLM_ATTENTION_BACKEND=FLASH_ATTN
vllm serve
--model /models/qwopus3.6-27b-v2-fp8
--host 0.0.0.0
--port 8080
--kv_cache_dtype fp8
--tool-call-parser qwen3_coder
--reasoning-parser qwen3
--enable-auto-tool-choice
--tensor-parallel-size 2
--max-model-len 262144
--gpu-memory-utilization 0.9046
--trust-remote-code
--compilation-config '{"cudagraph_mode": "PIECEWISE"}'
--default-chat-template-kwargs '{"enable_thinking": true, "preserve_thinking":true}'
--max-num-seqs 2
--compilation_config.mode VLLM_COMPILE
--enable-prefix-caching
--enable-chunked-prefill
--served-model-name qwen3.6-27b
--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
--max-num-batched-tokens 16384
--attention-backend FLASHINFER

For me it isnt working as well. Thinking for infinity :)
Bildschirmfoto 2026-05-23 um 23.20.01

Hi there, I am terribly sorry for the inconvenience!

I will look into this model issue immediately. For context, all of my testing was conducted using the transformers library, where it was able to output normally. I haven't used vllm myself, so there might be some compatibility or configuration issues.

Please bear with me for a moment while I run some tests and checks. I will update you here as soon as I find anything!

屏幕截图_20260524_033122


Did I do something wrong?

vllm run command :
VLLM_ATTENTION_BACKEND=FLASH_ATTN
vllm serve
--model /models/qwopus3.6-27b-v2-fp8
--host 0.0.0.0
--port 8080
--kv_cache_dtype fp8
--tool-call-parser qwen3_coder
--reasoning-parser qwen3
--enable-auto-tool-choice
--tensor-parallel-size 2
--max-model-len 262144
--gpu-memory-utilization 0.9046
--trust-remote-code
--compilation-config '{"cudagraph_mode": "PIECEWISE"}'
--default-chat-template-kwargs '{"enable_thinking": true, "preserve_thinking":true}'
--max-num-seqs 2
--compilation_config.mode VLLM_COMPILE
--enable-prefix-caching
--enable-chunked-prefill
--served-model-name qwen3.6-27b
--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
--max-num-batched-tokens 16384
--attention-backend FLASHINFER

For me it isnt working as well. Thinking for infinity :)
Bildschirmfoto 2026-05-23 um 23.20.01

Hi everyone!

Thank you so much for your feedback!

I have now fixed this issue and updated the model weights, model card, and recommended test configurations.

I’m terribly sorry for the inconvenience caused earlier. Please pull the latest model and give it another try!

Many thanks for the very fast reaction. There have to be now way you have to be sorry. We need to be thankful for everything you are doing :)

Sign up or log in to comment