Quant method

by gcapnias - opened Apr 11, 2024

Discussion

gcapnias

Apr 11, 2024

Just a question,

Why just 3bit and 5bit quantization models? Usually, all the models start with 4bit quantization.

I was looking in order to create run the model under Ollama. Usually, the 4bit models are used in order to be light.

George J.

soksof

Athena Research Center | Institute for Language and Speech Processing org Apr 11, 2024

We selected to share the Q5_K_M model because it provides better performance with a "small" difference in memory requirements, as well as the Q3 version, which is of lower quality but can run in lower-end GPUs.
If you are interested in a 4-bit version of the model you can find an AWQ one here: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-AWQ.

For ollama, we have uploaded a 4bit version here: https://ollama.com/ilsp/meltemi-instruct:q4.1

gcapnias

Apr 11, 2024

Great,

Thanks a lot!

soksof changed discussion status to closed Apr 11, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment