Thought Loop

by jukingjack1 - opened 8 days ago

Discussion

jukingjack1

8 days ago

https://github.com/XiaomiMiMo/MiMo-V2-Flash/issues/17

I think we are experiencing the same issues seen in the previous model where it just keeps thinking over and over.

Has there been any progress made towards this?

CompactAI

6 days ago

If you tell the model to not over think, then it should get a result much quicker

jukingjack1

4 days ago

If you tell the model to not over think, then it should get a result much quicker

Even with thinking disabled it enters a loop and exhausts the context. I told it not to overthink and it still did this.

S1quence

4 days ago

•

edited 4 days ago

If you tell the model to not over think, then it should get a result much quicker

Even with thinking disabled it enters a loop and exhausts the context. I told it not to overthink and it still did this.

Increase the repetition penalty to 1.2 will mitigate this. here is my vLLM command:

  --data-parallel-size 2 \
  --tensor-parallel-size 4 \
  --enable-expert-parallel \
  --gpu-memory-utilization 0.96 \
  --max-model-len auto \
  --reasoning-parser mimo \
  --tool-call-parser mimo \
  --served-model-name MiMo-V2.5 \
  --generation-config "model_hub/MiMo-V2.5" \
  --override-generation-config '{"repetition_penalty":1.2, "top_p":0.95, "temperature":0.6}' \
  --disable-hybrid-kv-cache-manager \
  --max-num-seqs 8 \
  --speculative-config '{"method":"mtp","num_speculative_tokens":1}'

This works in my 8*A800 80GB gpus. But this model is still hard to use, the overthink problem is blocking the model functionality. Seems the XiaoMi did not try to deploy the model with vLLM/SGLANG at all.

Besides, I have used the xiaomi's mimo API, the MiMo-V2.5/V2.5 Pro behaves normally, and the think process can be parsed by claude and opencode. So, my felling is that Xiaomi's official internal version differs from the open-source version—at least in terms of the generation_config or system_prompt.

jukingjack1

about 12 hours ago

https://aistudio.xiaomimimo.com/#/share/d9add9d5c1f37461347ab73c52f1c0da

I have just tested the model on the web-ui and this is the chat, the thought looping seems to be an issue with the actual model?

Is this a known issue internally?

S1quence

about 11 hours ago

https://aistudio.xiaomimimo.com/#/share/d9add9d5c1f37461347ab73c52f1c0da

I have just tested the model on the web-ui and this is the chat, the thought looping seems to be an issue with the actual model?

Is this a known issue internally?

Yes, I hightly suspect this is an internal issue, even with the official API, the model sometimes outputs repeated chain of thought. I have sent an Email to the XiaoMi and I have not got an response, maybe they are fixing this issue. Both the MiMo-V2.5 and its pro version have the same problem. But I mitigate this by setting repetition penalty to 1.2 in local deployment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment