πŸ¦™ TinyLlama 1.1B Chat - GGUF (Imatrix Quantized)

This repository contains GGUF quantized versions of the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model.

πŸš€ Optimized with Importance Matrix (Imatrix): Unlike standard quantizations that often use random data, this version has been processed using a dense dataset (The Adventures of Sherlock Holmes) to generate a high-fidelity Importance Matrix (Imatrix). This preserves the "smart" weights of the model, resulting in lower perplexity and better reasoning than standard K-quants.


πŸ“¦ Available Files

Filename Quant Type Size Use Case
TinyLlama-1.1B-Chat-v1.0-Q4_K_M.gguf Q4_K_M ~700 MB 🌟 Recommended. Best balance of speed/quality.

πŸ› οΈ How to Use

Option 1: llama.cpp (Command Line)

./llama-cli -m TinyLlama-1.1B-Chat-v1.0-Q4_K_M.gguf -p "Hello, how are you?" -n 400 -e
Downloads last month
31
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for deepsky-ia/TinyLlama-1.1B-Chat-v1.0-GGUF-MacQuantized

Quantized
(141)
this model