MiMo-V2.5 reasoning process is not separated from content in Claude Code

#5
by Kinfai - opened

When deploying MiMo-V2.5 via vLLM, its inference process is directly incorporated into the final output, rather than being isolated or hidden as is typically the case.

In contrast, deploying Qwen3.5-122B-A10B-FP8 using the same Docker image works correctly, and Claude Code is able to properly collapse these "thinking" code blocks.

image

The model may have broken out of its CoT, then began to think again.

Sign up or log in to comment