GGUF Quants produce gibberish

#1
by ToastyPigeon - opened

I suspect something changed in llama.cpp or one of the dependencies because I've been unable to make working quants of these models either.

For context the GGUFs at https://huggingface.co/mradermacher/Trinity-Mini-GGUF work fine on my local llama.cpp , but the quants from this repo (as well as the ones I tried to make myself of this model) produce random tokens. Same thing when I tried to replicate a GGUF of the original arcee-ai model, too (so an error in the quantization process in general I'm assuming).

I've confirmed the original model runs fine on vLLM, so it's not an issue with trinity-mini-marvin in particular.

Just wanted to give you a heads up about this (I'm unsure what the fix it, unfortunately).

Original trinity-mini gguf (older, works as intended):
image

trinity-mini-marvin (BF16, vLLM):
image

trininty-mini-marvin (GGUF from this repo):
image

Sign up or log in to comment