Ministral-3-8B-Instruct (Vision-Language & vLLM Compatible)

Ministral-3-8B-Instruct is a vision-aware, instruction-tuned multimodal language model developed by Mistral AI. It combines textual and visual understanding with strong reasoning capabilities and reliable instruction adherence.

This repository provides Q4_K_M and Q5_K_M quantized variants of the model, optimized for efficient local inference. These quantized formats reduce memory usage and improve inference performance while retaining support for vision-language interaction with both text and image inputs.


Model Overview

  • Model Name: Ministral-3-8B-Instruct
  • Base Model: mistralai/Ministral-3-8B-Instruct-2512
  • Architecture: Transformer-based multimodal model
  • Parameter Count: 8 Billion
  • Contexts Supported: Text & Images
  • Developer: Mistral AI
  • License: Apache 2.0

Quantization Formats

Q4_K_M

  • Approx. 71% size reduction (4.84 GB)
  • Substantial reduction in model size
  • Designed for low-memory environments
  • Faster inference on CPU-based systems
  • Suitable for lightweight and edge use cases

Q5_K_M

  • Approx. 66% size reduction (5.64 GB)
  • Higher precision than Q4 variants
  • Improved response consistency and reasoning depth
  • Recommended for balanced performance and quality

Vision-Language Capabilities

Ministral-3-8B-Instruct supports multimodal inputs, allowing users to provide both text and images within the same prompt. This enables applications such as:

  • Image captioning and explanation
  • Visual question answering
  • Instruction following grounded in vision
  • Contextual multimodal analysis

The model processes textual and visual information jointly, producing coherent responses that factor in both modalities.


Training Background

The base model was pretrained on a large mixture of text and visual data, followed by instruction tuning that emphasizes reliable multimodal reasoning and instruction compliance.

Pretraining

  • Large-scale multimodal pretraining
  • Joint text-image representation learning
  • Optimization for robust, coherent generation

Instruction Tuning

  • Fine-tuned with multimodal instruction datasets
  • Trained for clarity, task adherence, and visual reasoning
  • Enhanced for conversational quality across modalities

Key Capabilities

  • Multimodal Input Understanding
    Incorporates image content and text together to produce aligned responses.

  • Instruction Compliance
    Follows detailed user directives, including ones involving visual context.

  • Reasoning & Analysis
    Supports step-by-step explanation and problem solving, integrating visual evidence.

  • Conversational Dialogue
    Maintains fluid dialogue across mixed text-image interaction.

  • Efficient vLLM Serving
    Works well with vLLM inference for scalable deployment.


Usage Examples

LLama.cpp Usage

/llama-cli \
  -m SandlogicTechnologies\ministral-3-8b-instruct_Q4_K_M.gguf \
  --image ./example.png \
  -p "Explain what is happening in this image."

Recommended Applications

  • Multimodal Assistants Build systems that understand and respond to both images and text.

  • Visual QA Tools Create applications that answer questions grounded in image context.

  • Content Understanding Use for summarizing or reasoning over documents with associated images.

  • Conversational AI Serve rich, multimodal dialogues in high-throughput environments.


Acknowledgments

This repository is based on the Ministral-3-8B-Instruct model, developed by Mistral AI.

Thanks to:

  • The Mistral AI team for releasing multimodal capabilities
  • The llama.cpp community for enabling efficient GGUF inference

Contact

For questions, feedback, or support, please reach out at support@sandlogic.com or visit https://www.sandlogic.com/

Downloads last month
32
GGUF
Model size
8B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SandLogicTechnologies/Ministral-3-8B-Instruct-2512-GGUF