Quantization request.
#2
by daibuzizai - opened
Can ArliAI/GLM-4.6-Derestricted be quantized? The original V1 version was very efficient.
I can try.
A caveat is that I don’t have ik_llama.cpp imatrix data for 4.6 derestricted. I could use a mainline imatrix file, but (last I checked) the conversion to ik_llama.cpp's format is quite lossy, so quality will take a hit.
I could use baseline llama.cpp quant formats, but without IQ3_KT, quality will take a huge hit .
I can try to make a ik native imatrix, but I’m not sure I can do it on 128GB RAM. But I will investigate this.
Thank you very much for your work—it's a great help for people with limited hardware resources.