GGUF models please?

#1
by ONTHEREDTEAM - opened

Llama CPP was updated to repack Q4_K variants for optimization. Inference speed after loading and perplexity were tested to be of higher quality, but Q4_K_M is still slow. IQ4_NL and Q4_K_S are very similar now. So now I'm mostly downloading iMatrixed Q4_K_S GGUFs.

This is one of the higher 8B models on

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

And I would like to test it out.
Thank you.

Sign up or log in to comment