System prompt claims vision capabilities, but no vision encoder weights are included

#22

by T0mSIlver - opened Feb 12

Feb 12

The provided CHAT_SYSTEM_PROMPT.txt contains a "MULTI-MODAL INSTRUCTIONS" section stating:

You have the ability to read images, but you cannot generate images.

However, the repository doesn't ship any vision-related files — no mmproj, no image_processor_config.json, no preprocessor_config.json, no vision encoder weights. The model is loaded as MistralForCausalLM (text-only).

The smaller 24B variant (Devstral-Small-2-24B-Instruct-2512) does have vision support via a shared architecture with Ministral 3, but this doesn't seem to be the case for the 123B.

Could you clarify:

Is vision support planned for this checkpoint, with encoder weights to be uploaded later?
Or should the system prompt be updated to remove the multi-modal section to avoid confusion for downstream users who might expect image inputs to work?

As it stands, users relying on the provided system prompt as-is would be advertising a capability the model can't actually deliver.

juliendenize

Mistral AI_ org Feb 12

Thanks for pointing this out, i corrected the system prompt. This size does not possess vision but small does.

Apixhed

Feb 12

i thought 2512 updates makes it have vision😭🙏

juliendenize

Mistral AI_ org Feb 12

@Apixhed small does !
https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment