GGUF Quants produce gibberish

by ToastyPigeon - opened Feb 8

Feb 8

I suspect something changed in llama.cpp or one of the dependencies because I've been unable to make working quants of these models either.

For context the GGUFs at https://huggingface.co/mradermacher/Trinity-Mini-GGUF work fine on my local llama.cpp , but the quants from this repo (as well as the ones I tried to make myself of this model) produce random tokens. Same thing when I tried to replicate a GGUF of the original arcee-ai model, too (so an error in the quantization process in general I'm assuming).

I've confirmed the original model runs fine on vLLM, so it's not an issue with trinity-mini-marvin in particular.

Just wanted to give you a heads up about this (I'm unsure what the fix it, unfortunately).

Original trinity-mini gguf (older, works as intended):

trinity-mini-marvin (BF16, vLLM):

trininty-mini-marvin (GGUF from this repo):

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment