Unsloth Magic Quants - Q4

by sebastienbo - opened Jan 25

•

Hi,

You should take a look at unsloth Magic Quants methodology, it makes models much faster and much more accurate then Q4 quantisation.
It can dynamically analyze the most important weights to allow a Q4 model have the quality of a Q8 model.

Can you refactor your model with Magic Quantz?

It doesn't take much time. but would make your model much speedier then the current published models

Unsloth also published a new methodology that makes the process much faster and with less vram requirements:
https://unsloth.ai/docs/new/3x-faster-training-packing

Here is how you can start:
https://unsloth.ai/docs/basics/quantization-aware-training-qat

If you have access to the weights then llama imatrix is even better and easier to make a Q4_K_M model:
https://gitlab.informatik.uni-halle.de/ambcj/llama.cpp/-/blob/b2646/examples/imatrix/README.md

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment