Qalb-1.0-8B-Instruct – Q4_0 GGUF

This is a 4-bit quantized (Q4_0) GGUF version of enstazao/Qalb-1.0-8B-Instruct.

Q4_0 is a fast and compact quantization format with acceptable quality degradation — good balance for local inference on consumer hardware.

Quick start (llama.cpp)

./llama-cli -m Qalb-1.0-8B-Instruct-Q4_0.gguf -ngl 35 -c 8192 --color -cnv -p "سلام! کیسے مدد کر سکتا ہوں؟"

Recommended inference settings

Setting Value Comment
Quantization Q4_0 Fast, ~4.5 GB VRAM / RAM
GPU layers (-ngl) 30–50 Adjust according to your GPU VRAM
Context size (-c) 4096–8192 Depends on use-case
Temperature 0.7–0.9 Higher = more creative
Repeat penalty 1.1–1.15 Helps reduce repetition

Performance (approximate – RTX 4060 8 GB)

  • ~45–70 tokens/s at 4k context with good GPU offloading
  • Quality close to original model for most everyday tasks

Original model

enstazao/Qalb-1.0-8B-Instruct

Thank you to the original creators.

Use responsibly.

Downloads last month
17
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mahwizzzz/Qalb-1.0-8B-Instruct-Q4_0

Quantized
(4)
this model