llmfan46/gemma-4-26B-A4B-it-ultra-uncensored-heretic · Do we need new Quants? There are few fixes

Owner 14 days ago

I don't think so, look at my upload date, my GGUFs were uploaded 2 days ago, the changes that you mention are older than that.

SerialKicked

14 days ago

This list is dumb, half of it is inference /server bugfixes. But, the tokenizer changes sadly fucks with GGUF in an indirect way: the imatrix file needs to be redone, and by extension the GGUF.

Not sure about this one, but your 31B version, probably.

Owner 14 days ago

the imatrix file needs to be redone, and by extension the GGUF.

Not sure about this one, but your 31B version, probably.

None of my GGUF use Imatrix, I only do standard Quants with no Imatrix, knowing that does my 31B still need to be redone?

SerialKicked

14 days ago

•

I don't think so, but that's slightly above my paygrade.

(that said, now that I know your quants don't use imatrix, which you probably should specify somewhere, as it's quite rare nowadays, I'll remake them anyway :D)

14 days ago

(that said, now that I know your quants don't use imatrix, which you probably should specify somewhere, as it's quite rare nowadays, I'll remake them anyway :D)

You have metadata entry added in GGUF file that list imatrix dataset if matrix is applied to quant.

Owner 14 days ago

Okay, I redid the GGUFs, just finished uploading them.

Owner 14 days ago

The GGUFs for gemma-4-E4B-it, gemma-4-26B-A4B-it and gemma-4-31B-it have now all been redone and re-uploaded.

14 days ago

Big effort but likely useless: https://github.com/ggml-org/llama.cpp/pull/21500/commits/4e19abc52b275f547d2b9968095cc599c6e2e2e2 - it would work OK anyways.

Owner 14 days ago

Big effort but likely useless

Could you explain?

14 days ago

•

In above commit they added workaround to handle old GGUF's so they are handled by llama properly. Sorry for late post, just had some time now to check this. I just wondered myself if I have to regenerate.

SerialKicked

14 days ago

•

That's for the BOS token handling, it's just inference changes and metadata. (btw I really hate that backends feel like they should manage the BoS token themselves, now we have toggles at 3 levels, file/backend/frontend, issues with each and every new release, and go explain what's what to the end-user, or maintain consistency as a middleware.)

Out of all this changelog, only one is a possibly related real GGUF changes (the cuda one which outputs broken tokens on rare occasions), and only GGUF is done with imatrix. I had to read all that stuff to make sure.

14 days ago

Yeah, right for imatrixed ones.

Owner 14 days ago

The GGUFs for Gemma 4 E4B don't work at all now, tried it multiple times, safetensors are fine, GGUFs refuse to load, I guess it's an issue with llama.cpp? Gemma 4 31B and 26B seem to work no issues however.

14 days ago

•