no valid JSON data found in stream
#3
by InformaticsSolutions - opened
Using AesSedai/Qwen3.5-122B-A10B-GGUF/Q4_K_M/ with llama-server. Getting nothing back, with and without --mmproj, with and without --jinja. Even the worm-up comes back empty:
minit: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
<think>
'
srv init: init: chat template, thinking = 1
main: model loaded
Further requests result in this message in the logs:
request /v1/chat/completions - start: 1m25.326650411s, total: 1m25.614670347s
[WARN] error processing streaming response: no valid JSON data found in stream, path=/v1/chat/completions, recording minimal metrics
llama-server --version
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
Device 1: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
version: 8233 (c5a778891)
built with GNU 14.2.0 for Linux x86_64
What could be wrong? Other quants ( UD-Q4 from Unsloth, Qwen3.5-122B-A10B-heretic.mxfp4) work normally. Thank you.
Edit: same error with Q5_K_M.
I believe the problem was of my own making: once i removed -n 1 and --parallel 1 flags from the llama-server start-up command, the problem went away. Closing.
InformaticsSolutions changed discussion status to closed