Ministral-3 3B Q4_K_M GGUF

This is a Q4_K_M quantized GGUF conversion of mistralai/Ministral-3B-Instruct-2412 optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model Ministral-3B-Instruct
Parameters 3 billion
Quantization Q4_K_M (4-bit, medium quality)
File Size ~2.0 GB
Context Window 256,000 tokens
Architecture Mistral Transformer

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs efficiently on iOS devices with 6GB+ RAM
  • llama.cpp Integration: Compatible with llama.cpp and its bindings
  • On-Device AI: Private, offline inference without cloud dependencies

Capabilities

  • Fast Inference: Edge-optimized for quick responses
  • Massive Context: 256K token context window
  • Multilingual: Supports 12+ languages
  • Code Generation: Strong coding capabilities
  • General Assistant: Versatile for various tasks

Usage with llama.cpp

./llama-cli -m Ministral-3-3B-Q4_K_M.gguf -p "Your prompt here" -n 512

License

This model is released under the Apache 2.0 License.

Attribution

Downloads last month
141
GGUF
Model size
3B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support