System prompt claims vision capabilities, but no vision encoder weights are included
The provided CHAT_SYSTEM_PROMPT.txt contains a "MULTI-MODAL INSTRUCTIONS" section stating:
You have the ability to read images, but you cannot generate images.
However, the repository doesn't ship any vision-related files — no mmproj, no image_processor_config.json, no preprocessor_config.json, no vision encoder weights. The model is loaded as MistralForCausalLM (text-only).
The smaller 24B variant (Devstral-Small-2-24B-Instruct-2512) does have vision support via a shared architecture with Ministral 3, but this doesn't seem to be the case for the 123B.
Could you clarify:
- Is vision support planned for this checkpoint, with encoder weights to be uploaded later?
- Or should the system prompt be updated to remove the multi-modal section to avoid confusion for downstream users who might expect image inputs to work?
As it stands, users relying on the provided system prompt as-is would be advertising a capability the model can't actually deliver.
Thanks for pointing this out, i corrected the system prompt. This size does not possess vision but small does.
i thought 2512 updates makes it have vision😭🙏