When using in vibe or opencode, often responds with just "Task completed" instead of actually doing the task

#16
by sdroege - opened

Hi,

In comparison to devstral, in general this model seems to be a lot more concise and to the point, and the code results are usually noticeably better. Apart from the problem described below it seems to be a clear improvement.

When using this model in Mistral vibe or opencode, it often just responds with "Task completed" instead of actually doing the task. In vibe, it also often fails to correctly ask the user for switching out of plan mode and just responds "Task completed". Requesting it to actually do the task solves that but that doesn't seem ideal. I would've assumed that at least some vibe conversations were part of the training, so this result seems a bit surprising.

Configuration for using it in vibe is a simple (or the alternative with a local model)

[[models]]
name = "mistral-small-2603"
provider = "mistral"
alias = "mistral-small"
temperature = 0.2
input_price = 0.15
output_price = 0.6
thinking = "off"
auto_compact_threshold = 200000

I assume enabling thinking mode would likely improve this but switching that in the configuration doesn't seem to have any effect.

The same effect with "Task completed" can be observed with running the model locally (with Q4_K_M quantization) as well as remotely on Mistral infrastructure, so it's not going to be a quantization artifact.

Hei, I have managed to get it to work (at least predicable with Opencode) using a similar launch command to @martossien , witj ik_llama and great quant from Aessedai (Q5_K_M), on 8x 3090.

ik_llama.cpp/build/bin/llama-server
--model ~/models/gguf/aessedai/Mistral-Small-4-119B-2603-Q5_K_M/Mistral-Small-4-119B-2603-Q5_K_M-00001-of-00003.gguf
--alias "aessedai/Mistral-Small-4-119B-2603-Q5_K_M"
--ctx-size 262144
--no-mmap
--threads 12 --threads-batch 12
--batch-size 2048 --ubatch-size 2048
--parallel 1 --flash-attn on --n-gpu-layers 999
--tensor-split 0.8,1,0.8,1,0.8,1,0.8,1
--merge-qkv --cache-type-k q8_0 --cache-type-v q8_0
--chat-template-kwargs '{"reasoning_effort":"high"}'
--split-mode graph
-ot 'blk.(18|19).ffn_down_exps.weight=CUDA0'
-ot 'blk.(20|21).ffn_down_exps.weight=CUDA6'
-ot 'blk.(31|32).ffn_down_exps.weight=CUDA4'
-ot 'blk.4.ffn_down_exps.weight=CUDA2'
-ot 'blk.13.ffn_down_exps.weight=CUDA4'
-ot 'blk.22.ffn_down_exps.weight=CUDA2'
-ot 'blk.0.ffn_down_exps.weight=CUDA4'
-ot 'blk.27.ffn_down_exps.weight=CUDA4'
--host 0.0.0.0 --port 5005 --jinja

It's not that it doesn't work at all and like I said, same symptoms when using it via Mistral's OpenAI API. The problem is that it regularly declares "Task completed" (with exact these words) while not actually having done anything, and requesting it to actually do the task afterwards gets it unstuck.

It's not that it doesn't work at all and like I said, same symptoms when using it via Mistral's OpenAI API. The problem is that it regularly declares "Task completed" (with exact these words) while not actually having done anything, and requesting it to actually do the task afterwards gets it unstuck.

Surprisingly, I've got it working stable with Opencode and locak llm backend, quant from Aessedai (Q5_K_M). Indeed when used with Mistral Vibe (same setup) ... a disaster. 😀

Sign up or log in to comment