Mistral-7B-Instruct-v0.2 (GGUF Q8_0)

High-quality 8-bit quantized GGUF of Mistral-7B-Instruct-v0.2 for efficient local inference on CPUs and edge devices.

Model Details

Property Value
Base Model Mistral-7B-Instruct-v0.2
Quantization Q8_0 (8-bit)
Format GGUF
File Size ~7.7 GB
Context Length 32,768 tokens
License MIT

Quick Start

llama.cpp

# Download
huggingface-cli download Makatia/mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf \
    mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf --local-dir .

# Run inference
./llama-cli -m mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf \
    -p "[INST] Explain digital pre-distortion in 5G systems. [/INST]" \
    -n 512 -ngl 0

Python (llama-cpp-python)

from llama_cpp import Llama

model = Llama(
    model_path="mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf",
    n_ctx=4096,
    n_threads=4,  # adjust for your CPU
)

output = model(
    "[INST] What is SerDes equalization? [/INST]",
    max_tokens=512,
    temperature=0.7,
)
print(output["choices"][0]["text"])

LM Studio / Ollama

Download the GGUF file and load it directly in LM Studio or import via Ollama.

Quantization Details

Q8_0 retains near-original model quality while reducing memory footprint. Recommended when inference speed matters but quality degradation must be minimal.

Quantization Size Quality Speed
FP16 ~14 GB Baseline Slow
Q8_0 ~7.7 GB ~99.5% Fast
Q4_K_M ~4.4 GB ~97% Fastest

Hardware Requirements

Device RAM Required Performance
Desktop CPU (x86) 8 GB+ Good
Apple Silicon (M1+) 8 GB+ Excellent
Raspberry Pi 5 (8GB) 8 GB Functional (slow)

Credits


Maintainer: Makatia

Downloads last month
12
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Makatia/mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf