GGUF models please?
#1
by ONTHEREDTEAM - opened
Llama CPP was updated to repack Q4_K variants for optimization. Inference speed after loading and perplexity were tested to be of higher quality, but Q4_K_M is still slow. IQ4_NL and Q4_K_S are very similar now. So now I'm mostly downloading iMatrixed Q4_K_S GGUFs.
This is one of the higher 8B models on
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
And I would like to test it out.
Thank you.