Edge LLM Deployments
Collection
LLMs quantized for Raspberry Pi and ARM edge. GGUF + ONNX. No cloud required. • 3 items • Updated
Quantized ONNX export of Microsoft Phi-3-mini-4k-instruct, packaged for deployment on Raspberry Pi and ARM-based edge devices.
| Property | Value |
|---|---|
| Base Model | Phi-3-mini-4k-instruct (3.8B params) |
| Format | ONNX (quantized) |
| Archive | microsoft_Phi-3-mini-4k-instruct_onnx_rpi.tar.gz |
| Context Length | 4,096 tokens |
| Target Hardware | Raspberry Pi 5, ARM64 SBCs |
| License | MIT |
# Download
huggingface-cli download Makatia/microsoft_Phi-3-mini-4k-instruct_onnx_rpi \
microsoft_Phi-3-mini-4k-instruct_onnx_rpi.tar.gz --local-dir .
# Extract
tar -xzf microsoft_Phi-3-mini-4k-instruct_onnx_rpi.tar.gz
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
session = ort.InferenceSession(
"phi-3-mini-4k-instruct.onnx",
providers=["CPUExecutionProvider"],
)
prompt = "Explain what a power amplifier does in a 5G base station."
inputs = tokenizer(prompt, return_tensors="np")
outputs = session.run(None, dict(inputs))
# Install dependencies on RPi5
pip install onnxruntime numpy transformers
Phi-3-mini is one of the most capable small language models available. At 3.8B parameters with ONNX quantization, it fits within the memory constraints of an 8GB Raspberry Pi 5 while providing strong reasoning and instruction-following capabilities.
| Benchmark | Score |
|---|---|
| MMLU (5-shot) | 68.8% |
| HellaSwag | 76.7% |
| TruthfulQA | 53.4% |
| Device | RAM | Status |
|---|---|---|
| Raspberry Pi 5 (8GB) | 8 GB | Supported |
| Raspberry Pi 5 (4GB) | 4 GB | Requires swap |
| Jetson Nano | 4 GB | Supported with swap |
| Desktop x86 | 8 GB+ | Supported |
Maintainer: Makatia