Thought Loop

#6
by jukingjack1 - opened

https://github.com/XiaomiMiMo/MiMo-V2-Flash/issues/17

I think we are experiencing the same issues seen in the previous model where it just keeps thinking over and over.

Has there been any progress made towards this?

If you tell the model to not over think, then it should get a result much quicker

If you tell the model to not over think, then it should get a result much quicker

image

Even with thinking disabled it enters a loop and exhausts the context. I told it not to overthink and it still did this.

If you tell the model to not over think, then it should get a result much quicker

image

Even with thinking disabled it enters a loop and exhausts the context. I told it not to overthink and it still did this.

Increase the repetition penalty to 1.2 will mitigate this. here is my vLLM command:

  --data-parallel-size 2 \
  --tensor-parallel-size 4 \
  --enable-expert-parallel \
  --gpu-memory-utilization 0.96 \
  --max-model-len auto \
  --reasoning-parser mimo \
  --tool-call-parser mimo \
  --served-model-name MiMo-V2.5 \
  --generation-config "model_hub/MiMo-V2.5" \
  --override-generation-config '{"repetition_penalty":1.2, "top_p":0.95, "temperature":0.6}' \
  --disable-hybrid-kv-cache-manager \
  --max-num-seqs 8 \
  --speculative-config '{"method":"mtp","num_speculative_tokens":1}'

This works in my 8*A800 80GB gpus. But this model is still hard to use, the overthink problem is blocking the model functionality. Seems the XiaoMi did not try to deploy the model with vLLM/SGLANG at all.

Besides, I have used the xiaomi's mimo API, the MiMo-V2.5/V2.5 Pro behaves normally, and the think process can be parsed by claude and opencode. So, my felling is that Xiaomi's official internal version differs from the open-source version—at least in terms of the generation_config or system_prompt.

https://aistudio.xiaomimimo.com/#/share/d9add9d5c1f37461347ab73c52f1c0da

I have just tested the model on the web-ui and this is the chat, the thought looping seems to be an issue with the actual model?

Is this a known issue internally?

https://aistudio.xiaomimimo.com/#/share/d9add9d5c1f37461347ab73c52f1c0da

I have just tested the model on the web-ui and this is the chat, the thought looping seems to be an issue with the actual model?

Is this a known issue internally?

Yes, I hightly suspect this is an internal issue, even with the official API, the model sometimes outputs repeated chain of thought. I have sent an Email to the XiaoMi and I have not got an response, maybe they are fixing this issue. Both the MiMo-V2.5 and its pro version have the same problem. But I mitigate this by setting repetition penalty to 1.2 in local deployment.
1

2

Sign up or log in to comment