GGUF Quants produce gibberish
I suspect something changed in llama.cpp or one of the dependencies because I've been unable to make working quants of these models either.
For context the GGUFs at https://huggingface.co/mradermacher/Trinity-Mini-GGUF work fine on my local llama.cpp , but the quants from this repo (as well as the ones I tried to make myself of this model) produce random tokens. Same thing when I tried to replicate a GGUF of the original arcee-ai model, too (so an error in the quantization process in general I'm assuming).
I've confirmed the original model runs fine on vLLM, so it's not an issue with trinity-mini-marvin in particular.
Just wanted to give you a heads up about this (I'm unsure what the fix it, unfortunately).
Original trinity-mini gguf (older, works as intended):

