MiMo-V2.5 reasoning process is not separated from content in Claude Code
#5
by Kinfai - opened
When deploying MiMo-V2.5 via vLLM, its inference process is directly incorporated into the final output, rather than being isolated or hidden as is typically the case.
In contrast, deploying Qwen3.5-122B-A10B-FP8 using the same Docker image works correctly, and Claude Code is able to properly collapse these "thinking" code blocks.
The model may have broken out of its CoT, then began to think again.
