What is this?

This is GGUF of Qwen2-57B-A14B-Instruct.
I think this is the world's first successful example to make Qwen2-57B-A14B-Instruct's gguf with imatrix.

imatrix dataset

TFMC/imatrix-dataset-for-japanese-llm.
This dataset contains English and many Japanese sentence.

How to made it

First I made Q6_K then tried to convert it to i-quants with --allow-requantize option.
Surprisingly (re)quantizeing process has been completed without ploblems.
I will show more detail below.

Step1: Making GGUF of f16

At first, I converted safetensors to GGUF.

Step2: Converting f16 to Q8_0

Second, I converted f16 to Q8_0.
This is aiming to accelerate next process because I don't have enough memory to deal with high precision tensor.

Step3: Calculating imatrix

I calculate imatrix with the Q8_0.
I seem to some people succeed to calculate imatrix so I think anyone can make imatrix.
At this time, I use '-fa' option because I want to finish culculation as much as possible.
However, later I knew some people claim Qwen2 needs -fa option to work correctly.

Step4: Making Q6_K temporarily

This is most important step. Firstly I converted f16 to Q6_K.
Never try to make i-quants directly. No one may succeed to make it directly.

Step5: Converting Q6 to i-quants with the imatrix

I converted the Q6_K to i-quants with imatrix.
Strangely the process has been finished and the i-quants may work.

environment

GeForce RTX 3090 and llama.cpp windows binary b3065

License

Apache 2.0

Developer

Alibaba Cloud

Downloads last month
17
GGUF
Model size
57B params
Architecture
qwen2moe
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support