π¦ TinyLlama 1.1B Chat - GGUF (Imatrix Quantized)
This repository contains GGUF quantized versions of the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model.
π Optimized with Importance Matrix (Imatrix): Unlike standard quantizations that often use random data, this version has been processed using a dense dataset (The Adventures of Sherlock Holmes) to generate a high-fidelity Importance Matrix (Imatrix). This preserves the "smart" weights of the model, resulting in lower perplexity and better reasoning than standard K-quants.
π¦ Available Files
| Filename | Quant Type | Size | Use Case |
|---|---|---|---|
TinyLlama-1.1B-Chat-v1.0-Q4_K_M.gguf |
Q4_K_M | ~700 MB | π Recommended. Best balance of speed/quality. |
π οΈ How to Use
Option 1: llama.cpp (Command Line)
./llama-cli -m TinyLlama-1.1B-Chat-v1.0-Q4_K_M.gguf -p "Hello, how are you?" -n 400 -e
- Downloads last month
- 31
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for deepsky-ia/TinyLlama-1.1B-Chat-v1.0-GGUF-MacQuantized
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0