System prompt claims vision capabilities, but no vision encoder weights are included

#22
by T0mSIlver - opened

The provided CHAT_SYSTEM_PROMPT.txt contains a "MULTI-MODAL INSTRUCTIONS" section stating:

You have the ability to read images, but you cannot generate images.

However, the repository doesn't ship any vision-related files — no mmproj, no image_processor_config.json, no preprocessor_config.json, no vision encoder weights. The model is loaded as MistralForCausalLM (text-only).

The smaller 24B variant (Devstral-Small-2-24B-Instruct-2512) does have vision support via a shared architecture with Ministral 3, but this doesn't seem to be the case for the 123B.

Could you clarify:

  1. Is vision support planned for this checkpoint, with encoder weights to be uploaded later?
  2. Or should the system prompt be updated to remove the multi-modal section to avoid confusion for downstream users who might expect image inputs to work?

As it stands, users relying on the provided system prompt as-is would be advertising a capability the model can't actually deliver.

Mistral AI_ org

Thanks for pointing this out, i corrected the system prompt. This size does not possess vision but small does.

i thought 2512 updates makes it have vision😭🙏

Sign up or log in to comment