TinyLlama-1.1B-Chat-v1.0 (ONNX, Edge Optimized)

Quantized ONNX export of TinyLlama-1.1B-Chat-v1.0 for local inference on edge devices, single-board computers, and resource-constrained environments.

Model Details

Property Value
Base Model TinyLlama-1.1B-Chat-v1.0
Parameters 1.1B
Format ONNX (quantized)
Archive TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx.7z
Context Length 2,048 tokens
Target Hardware Raspberry Pi, ARM64, edge CPUs
License MIT

Quick Start

1. Extract the Model

# Download
huggingface-cli download Makatia/TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx \
    TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx.7z --local-dir .

# Extract (requires 7-zip)
7z x TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx.7z

2. Run Inference

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

session = ort.InferenceSession(
    "model.onnx",
    providers=["CPUExecutionProvider"],
)

prompt = "<|user|>What is LSTM and how is it used in signal processing?</s><|assistant|>"
inputs = tokenizer(prompt, return_tensors="np")

outputs = session.run(None, dict(inputs))

Why TinyLlama on Edge?

At 1.1B parameters, TinyLlama runs on a Raspberry Pi 4/5 with acceptable latency while maintaining useful conversational ability. The ONNX format enables hardware-accelerated inference through ONNX Runtime on any platform.

Hardware Requirements

Device RAM Inference Speed
Raspberry Pi 5 (8GB) 2 GB footprint ~5 tokens/s
Raspberry Pi 4 (4GB) 2 GB footprint ~2 tokens/s
Desktop x86 4 GB+ ~20 tokens/s
Apple Silicon 4 GB+ ~30 tokens/s

Archive Contents

  • model.onnx -- Full ONNX model
  • model_quantized.onnx -- INT8 quantized variant
  • config.json -- Model configuration
  • tokenizer.json -- Tokenizer files

Credits


Maintainer: Makatia

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Makatia/TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx