Nanbeige-4.1-3B-Instruct GGUF (Q6_K)
This repository contains the GGUF (Q6_K) quantization of the Nanbeige-4.1-3B-Instruct model, optimized for use with llama.cpp.
Quantization Details
The quantization was performed using llama.cpp with an importance matrix (imatrix) for better preservation of model quality.
- Method: Q6_K (6-bit quantization)
- Importance Matrix: Generated from a calibration dataset for improved performance compared to standard quantization.
- Base Model: Nanbeige-4.1-3B-Instruct
Files Included
Nanbeige-4.1-3B-Q6_K-imatrix.gguf: The quantized model file.Nanbeige-4.1-3B.imatrix: The importance matrix used during quantization.Nanbeige-4.1-3B-Q6_K-imatrix.gguf.sha256sum: SHA256 checksum for verification.
How to use
You can run this model using llama.cpp or other compatible interfaces.
Example CLI usage:
./llama-cli -m Nanbeige-4.1-3B-Q6_K-imatrix.gguf -n 512 --prompt "Hello, how are you today?"
Reproducibility
The importance matrix (Nanbeige-4.1-3B.imatrix) is provided so you can recreate or refine the quantization process.
- Downloads last month
- 25
Hardware compatibility
Log In to add your hardware
6-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support