Unsloth Magic Quants - Q4
Hi,
You should take a look at unsloth Magic Quants methodology, it makes models much faster and much more accurate then Q4 quantisation.
It can dynamically analyze the most important weights to allow a Q4 model have the quality of a Q8 model.
Can you refactor your model with Magic Quantz?
It doesn't take much time. but would make your model much speedier then the current published models
Unsloth also published a new methodology that makes the process much faster and with less vram requirements:
https://unsloth.ai/docs/new/3x-faster-training-packing
Here is how you can start:
https://unsloth.ai/docs/basics/quantization-aware-training-qat
If you have access to the weights then llama imatrix is even better and easier to make a Q4_K_M model:
https://gitlab.informatik.uni-halle.de/ambcj/llama.cpp/-/blob/b2646/examples/imatrix/README.md