no valid JSON data found in stream

#3
by InformaticsSolutions - opened

Using AesSedai/Qwen3.5-122B-A10B-GGUF/Q4_K_M/ with llama-server. Getting nothing back, with and without --mmproj, with and without --jinja. Even the worm-up comes back empty:

minit: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
<think>
'
srv          init: init: chat template, thinking = 1
main: model loaded

Further requests result in this message in the logs:

request /v1/chat/completions - start: 1m25.326650411s, total: 1m25.614670347s
[WARN] error processing streaming response: no valid JSON data found in stream, path=/v1/chat/completions, recording minimal metrics
llama-server --version
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
  Device 1: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
version: 8233 (c5a778891)
built with GNU 14.2.0 for Linux x86_64

What could be wrong? Other quants ( UD-Q4 from Unsloth, Qwen3.5-122B-A10B-heretic.mxfp4) work normally. Thank you.

Edit: same error with Q5_K_M.

I believe the problem was of my own making: once i removed -n 1 and --parallel 1 flags from the llama-server start-up command, the problem went away. Closing.

InformaticsSolutions changed discussion status to closed

Sign up or log in to comment