Phi-3 Mini 4K Instruct (ONNX, Raspberry Pi Optimized)

Quantized ONNX export of Microsoft Phi-3-mini-4k-instruct, packaged for deployment on Raspberry Pi and ARM-based edge devices.

Model Details

Property Value
Base Model Phi-3-mini-4k-instruct (3.8B params)
Format ONNX (quantized)
Archive microsoft_Phi-3-mini-4k-instruct_onnx_rpi.tar.gz
Context Length 4,096 tokens
Target Hardware Raspberry Pi 5, ARM64 SBCs
License MIT

Quick Start

1. Extract the Model

# Download
huggingface-cli download Makatia/microsoft_Phi-3-mini-4k-instruct_onnx_rpi \
    microsoft_Phi-3-mini-4k-instruct_onnx_rpi.tar.gz --local-dir .

# Extract
tar -xzf microsoft_Phi-3-mini-4k-instruct_onnx_rpi.tar.gz

2. Run Inference

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

session = ort.InferenceSession(
    "phi-3-mini-4k-instruct.onnx",
    providers=["CPUExecutionProvider"],
)

prompt = "Explain what a power amplifier does in a 5G base station."
inputs = tokenizer(prompt, return_tensors="np")

outputs = session.run(None, dict(inputs))

3. Raspberry Pi Setup

# Install dependencies on RPi5
pip install onnxruntime numpy transformers

Why Phi-3 on Edge?

Phi-3-mini is one of the most capable small language models available. At 3.8B parameters with ONNX quantization, it fits within the memory constraints of an 8GB Raspberry Pi 5 while providing strong reasoning and instruction-following capabilities.

Benchmark Score
MMLU (5-shot) 68.8%
HellaSwag 76.7%
TruthfulQA 53.4%

Hardware Requirements

Device RAM Status
Raspberry Pi 5 (8GB) 8 GB Supported
Raspberry Pi 5 (4GB) 4 GB Requires swap
Jetson Nano 4 GB Supported with swap
Desktop x86 8 GB+ Supported

Credits


Maintainer: Makatia

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Makatia/microsoft_Phi-3-mini-4k-instruct_onnx_rpi