Nanbeige-4.1-3B-Instruct GGUF (Q6_K)

This repository contains the GGUF (Q6_K) quantization of the Nanbeige-4.1-3B-Instruct model, optimized for use with llama.cpp.

Quantization Details

The quantization was performed using llama.cpp with an importance matrix (imatrix) for better preservation of model quality.

Method: Q6_K (6-bit quantization)
Importance Matrix: Generated from a calibration dataset for improved performance compared to standard quantization.
Base Model: Nanbeige-4.1-3B-Instruct

Nanbeige-4.1-3B-Q6_K-imatrix.gguf: The quantized model file.
Nanbeige-4.1-3B.imatrix: The importance matrix used during quantization.
Nanbeige-4.1-3B-Q6_K-imatrix.gguf.sha256sum: SHA256 checksum for verification.

You can run this model using llama.cpp or other compatible interfaces.

./llama-cli -m Nanbeige-4.1-3B-Q6_K-imatrix.gguf -n 512 --prompt "Hello, how are you today?"

The importance matrix (Nanbeige-4.1-3B.imatrix) is provided so you can recreate or refine the quantization process.

GGUF

Hardware compatibility

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support