🦙 TinyLlama 1.1B Chat - GGUF (Imatrix Quantized)

This repository contains GGUF quantized versions of the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model.

🚀 Optimized with Importance Matrix (Imatrix): Unlike standard quantizations that often use random data, this version has been processed using a dense dataset (The Adventures of Sherlock Holmes) to generate a high-fidelity Importance Matrix (Imatrix). This preserves the "smart" weights of the model, resulting in lower perplexity and better reasoning than standard K-quants.

📦 Available Files

Filename	Quant Type	Size	Use Case
`TinyLlama-1.1B-Chat-v1.0-Q4_K_M.gguf`	Q4_K_M	~700 MB	🌟 Recommended. Best balance of speed/quality.

🛠️ How to Use

Option 1: `llama.cpp` (Command Line)

./llama-cli -m TinyLlama-1.1B-Chat-v1.0-Q4_K_M.gguf -p "Hello, how are you?" -n 400 -e

Downloads last month: 31

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for deepsky-ia/TinyLlama-1.1B-Chat-v1.0-GGUF-MacQuantized

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Quantized

(141)

this model

🦙 TinyLlama 1.1B Chat - GGUF (Imatrix Quantized)

📦 Available Files

🛠️ How to Use

Option 1: llama.cpp (Command Line)

Model tree for deepsky-ia/TinyLlama-1.1B-Chat-v1.0-GGUF-MacQuantized

Option 1: `llama.cpp` (Command Line)