Ministral-3-8B-Instruct (Vision-Language & vLLM Compatible)
Ministral-3-8B-Instruct is a vision-aware, instruction-tuned multimodal language model developed by Mistral AI. It combines textual and visual understanding with strong reasoning capabilities and reliable instruction adherence.
This repository provides Q4_K_M and Q5_K_M quantized variants of the model, optimized for efficient local inference. These quantized formats reduce memory usage and improve inference performance while retaining support for vision-language interaction with both text and image inputs.
Model Overview
- Model Name: Ministral-3-8B-Instruct
- Base Model: mistralai/Ministral-3-8B-Instruct-2512
- Architecture: Transformer-based multimodal model
- Parameter Count: 8 Billion
- Contexts Supported: Text & Images
- Developer: Mistral AI
- License: Apache 2.0
Quantization Formats
Q4_K_M
- Approx.
71% size reduction (4.84 GB) - Substantial reduction in model size
- Designed for low-memory environments
- Faster inference on CPU-based systems
- Suitable for lightweight and edge use cases
Q5_K_M
- Approx.
66% size reduction (5.64 GB) - Higher precision than Q4 variants
- Improved response consistency and reasoning depth
- Recommended for balanced performance and quality
Vision-Language Capabilities
Ministral-3-8B-Instruct supports multimodal inputs, allowing users to provide both text and images within the same prompt. This enables applications such as:
- Image captioning and explanation
- Visual question answering
- Instruction following grounded in vision
- Contextual multimodal analysis
The model processes textual and visual information jointly, producing coherent responses that factor in both modalities.
Training Background
The base model was pretrained on a large mixture of text and visual data, followed by instruction tuning that emphasizes reliable multimodal reasoning and instruction compliance.
Pretraining
- Large-scale multimodal pretraining
- Joint text-image representation learning
- Optimization for robust, coherent generation
Instruction Tuning
- Fine-tuned with multimodal instruction datasets
- Trained for clarity, task adherence, and visual reasoning
- Enhanced for conversational quality across modalities
Key Capabilities
Multimodal Input Understanding
Incorporates image content and text together to produce aligned responses.Instruction Compliance
Follows detailed user directives, including ones involving visual context.Reasoning & Analysis
Supports step-by-step explanation and problem solving, integrating visual evidence.Conversational Dialogue
Maintains fluid dialogue across mixed text-image interaction.Efficient vLLM Serving
Works well with vLLM inference for scalable deployment.
Usage Examples
LLama.cpp Usage
/llama-cli \
-m SandlogicTechnologies\ministral-3-8b-instruct_Q4_K_M.gguf \
--image ./example.png \
-p "Explain what is happening in this image."
Recommended Applications
Multimodal Assistants Build systems that understand and respond to both images and text.
Visual QA Tools Create applications that answer questions grounded in image context.
Content Understanding Use for summarizing or reasoning over documents with associated images.
Conversational AI Serve rich, multimodal dialogues in high-throughput environments.
Acknowledgments
This repository is based on the Ministral-3-8B-Instruct model, developed by Mistral AI.
Thanks to:
- The Mistral AI team for releasing multimodal capabilities
- The
llama.cppcommunity for enabling efficient GGUF inference
Contact
For questions, feedback, or support, please reach out at support@sandlogic.com or visit https://www.sandlogic.com/
- Downloads last month
- 32
4-bit
5-bit
Model tree for SandLogicTechnologies/Ministral-3-8B-Instruct-2512-GGUF
Base model
mistralai/Ministral-3-8B-Base-2512