Qalb-1.0-8B-Instruct – Q4_0 GGUF
This is a 4-bit quantized (Q4_0) GGUF version of enstazao/Qalb-1.0-8B-Instruct.
Q4_0 is a fast and compact quantization format with acceptable quality degradation — good balance for local inference on consumer hardware.
Quick start (llama.cpp)
./llama-cli -m Qalb-1.0-8B-Instruct-Q4_0.gguf -ngl 35 -c 8192 --color -cnv -p "سلام! کیسے مدد کر سکتا ہوں؟"
Recommended inference settings
| Setting | Value | Comment |
|---|---|---|
| Quantization | Q4_0 | Fast, ~4.5 GB VRAM / RAM |
| GPU layers (-ngl) | 30–50 | Adjust according to your GPU VRAM |
| Context size (-c) | 4096–8192 | Depends on use-case |
| Temperature | 0.7–0.9 | Higher = more creative |
| Repeat penalty | 1.1–1.15 | Helps reduce repetition |
Performance (approximate – RTX 4060 8 GB)
- ~45–70 tokens/s at 4k context with good GPU offloading
- Quality close to original model for most everyday tasks
Original model
Thank you to the original creators.
Use responsibly.
- Downloads last month
- 17
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for mahwizzzz/Qalb-1.0-8B-Instruct-Q4_0
Base model
unsloth/Meta-Llama-3.1-8B Finetuned
enstazao/Qalb-1.0-8B-Instruct