Nanbeige-4.1-3B-Instruct GGUF (Q6_K)

This repository contains the GGUF (Q6_K) quantization of the Nanbeige-4.1-3B-Instruct model, optimized for use with llama.cpp.

Quantization Details

The quantization was performed using llama.cpp with an importance matrix (imatrix) for better preservation of model quality.

  • Method: Q6_K (6-bit quantization)
  • Importance Matrix: Generated from a calibration dataset for improved performance compared to standard quantization.
  • Base Model: Nanbeige-4.1-3B-Instruct

Files Included

  • Nanbeige-4.1-3B-Q6_K-imatrix.gguf: The quantized model file.
  • Nanbeige-4.1-3B.imatrix: The importance matrix used during quantization.
  • Nanbeige-4.1-3B-Q6_K-imatrix.gguf.sha256sum: SHA256 checksum for verification.

How to use

You can run this model using llama.cpp or other compatible interfaces.

Example CLI usage:

./llama-cli -m Nanbeige-4.1-3B-Q6_K-imatrix.gguf -n 512 --prompt "Hello, how are you today?"

Reproducibility

The importance matrix (Nanbeige-4.1-3B.imatrix) is provided so you can recreate or refine the quantization process.

Downloads last month
25
GGUF
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support