Fix streaming output when enable_thinking is disabled

#29

by Kwindla - opened Dec 22, 2025

base: refs/heads/main

←

from: refs/pr/29

Discussion Files changed

+49

-1

Kwindla

Dec 22, 2025

Fix streaming output when enable_thinking is disabled

Problem

The current nano_v3_reasoning_parser.py correctly handles the enable_thinking: false flag for non-streaming requests, but streaming requests still route content to the wrong field.

When using vLLM with streaming enabled and thinking disabled:

response = client.chat.completions.create(
    model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}
)

Current behavior: Content appears in delta.reasoning_content instead of delta.content

Expected behavior: Content should appear in delta.content (since thinking is disabled)

Root Cause

The existing extract_reasoning method handles the field swap for non-streaming responses, but the streaming path uses extract_reasoning_streaming from the parent DeepSeekR1ReasoningParser, which doesn't know about the enable_thinking flag.

Solution

Override extract_reasoning_streaming to swap the fields when thinking is disabled, matching the behavior of the non-streaming path.

Changes

Add __init__ to capture enable_thinking state at parser initialization
Add extract_reasoning_streaming override to swap fields in streaming mode
Add docstring explaining the parser's purpose

Testing

Tested with vLLM v0.1.dev on NVIDIA DGX Spark (GB10) with both streaming and non-streaming requests:

# Streaming with thinking disabled - now works correctly
curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/path/to/model",
        "messages": [{"role": "user", "content": "Hello"}],
        "stream": true,
        "chat_template_kwargs": {"enable_thinking": false}
    }'

Content now correctly appears in delta.content for all streaming chunks.

Fix streaming output when enable_thinking is disabledf17bf950

Kwindla changed pull request status to open Dec 22, 2025

g-a-b-y

Jan 16

@kwondla This indeed fixes the non-reasoning/streaming issue, but breaks tool calling.

I can't get any IDE to use tools after adding this parser.

Any ideas?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment