Edge LLM Deployments
Collection
LLMs quantized for Raspberry Pi and ARM edge. GGUF + ONNX. No cloud required. • 3 items • Updated
High-quality 8-bit quantized GGUF of Mistral-7B-Instruct-v0.2 for efficient local inference on CPUs and edge devices.
| Property | Value |
|---|---|
| Base Model | Mistral-7B-Instruct-v0.2 |
| Quantization | Q8_0 (8-bit) |
| Format | GGUF |
| File Size | ~7.7 GB |
| Context Length | 32,768 tokens |
| License | MIT |
# Download
huggingface-cli download Makatia/mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf \
mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf --local-dir .
# Run inference
./llama-cli -m mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf \
-p "[INST] Explain digital pre-distortion in 5G systems. [/INST]" \
-n 512 -ngl 0
from llama_cpp import Llama
model = Llama(
model_path="mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf",
n_ctx=4096,
n_threads=4, # adjust for your CPU
)
output = model(
"[INST] What is SerDes equalization? [/INST]",
max_tokens=512,
temperature=0.7,
)
print(output["choices"][0]["text"])
Download the GGUF file and load it directly in LM Studio or import via Ollama.
Q8_0 retains near-original model quality while reducing memory footprint. Recommended when inference speed matters but quality degradation must be minimal.
| Quantization | Size | Quality | Speed |
|---|---|---|---|
| FP16 | ~14 GB | Baseline | Slow |
| Q8_0 | ~7.7 GB | ~99.5% | Fast |
| Q4_K_M | ~4.4 GB | ~97% | Fastest |
| Device | RAM Required | Performance |
|---|---|---|
| Desktop CPU (x86) | 8 GB+ | Good |
| Apple Silicon (M1+) | 8 GB+ | Excellent |
| Raspberry Pi 5 (8GB) | 8 GB | Functional (slow) |
Maintainer: Makatia
8-bit