Quant method

#1
by gcapnias - opened

Just a question,

Why just 3bit and 5bit quantization models? Usually, all the models start with 4bit quantization.

I was looking in order to create run the model under Ollama. Usually, the 4bit models are used in order to be light.

George J.

Athena Research Center | Institute for Language and Speech Processing org

We selected to share the Q5_K_M model because it provides better performance with a "small" difference in memory requirements, as well as the Q3 version, which is of lower quality but can run in lower-end GPUs.
If you are interested in a 4-bit version of the model you can find an AWQ one here: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-AWQ.

For ollama, we have uploaded a 4bit version here: https://ollama.com/ilsp/meltemi-instruct:q4.1

Great,

Thanks a lot!

soksof changed discussion status to closed

Sign up or log in to comment