Qalb-1.0-8B-Instruct – Q4_0 GGUF

This is a 4-bit quantized (Q4_0) GGUF version of enstazao/Qalb-1.0-8B-Instruct.

Q4_0 is a fast and compact quantization format with acceptable quality degradation — good balance for local inference on consumer hardware.

Quick start (llama.cpp)

./llama-cli -m Qalb-1.0-8B-Instruct-Q4_0.gguf -ngl 35 -c 8192 --color -cnv -p "سلام! کیسے مدد کر سکتا ہوں؟"

Setting	Value	Comment
Quantization	Q4_0	Fast, ~4.5 GB VRAM / RAM
GPU layers (-ngl)	30–50	Adjust according to your GPU VRAM
Context size (-c)	4096–8192	Depends on use-case
Temperature	0.7–0.9	Higher = more creative
Repeat penalty	1.1–1.15	Helps reduce repetition

Thank you to the original creators.

Use responsibly.

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Quantized

(4)

this model