Bit off-topic (couldn’t resist)

by IrisColt - opened 16 days ago

Any plans to quantize coder3101's heretic Gemma4 models ( https://huggingface.co/collections/coder3101/gemma-4 )? Your quants are insane, and I'm dying to see how the heretic v2 version of Gemma 4 31B (and perhaps the 26B A4B) stack up against their Qwen 3.5 counterparts (27B and 30B A3B) side-by-side. Pretty please with sugar on top?

SerialKicked

Owner 16 days ago

•

edited 16 days ago

My quants shouldn't be that different from mradermacher's, I just use a slightly different imatrix file. I make quants for models that aren't being processed by others, or for which my favorite method is not available. I might make one for the sake of completion, but not until this weekend at the least.

His v2 is broken and not worth quantizing, his v1 is already quantized. I'd personally use @llmfan46 's own heretic version instead. It seems to hit the sweet spot for me. Normal responses are undisguisable from base, and I couldn't make it refuse anything.

A note about heretic versions, which is true for all those nice pretty tables those heretic people make. This "99% refusal" figure is bullshit. It's the stats given when the system prompt is the default helpful assistant, and that the offending question is being asked on a first message basis. That's how we end up with so-called heretic versions of RP models that are already completely wild.

Point is, in this new official Gemma4 model, the moment you use a character card instead of a default assistant system prompt, there's surprisingly little 'censoring' in place for a corporate model. It has its limits (of course), but it's important to note.

IrisColt

16 days ago

Excellent work digging this up. Thanks for the info!

IrisColt

15 days ago

Thanks! @llmfan46 's model is really something. I actually felt a tiny, fleeting sting from something it said... first time that's ever happened. Heh, you know it's good when it can even ruffle your feathers.

SerialKicked

Owner 11 days ago

It's pretty good, yes. Ngl, it has a GPT-4o feeling i'm not a fan of in its delivery, but from what i've seen, it can be beaten out of it. It's also the first model in that size range for which i didn't have to reroll any message due to being incoherent since, well forever.

Anyway, once the dust settles, in a week or two, because right now llama.cpp is being updated every second, and on the google side, they're editing their chat template again and again. I'll make a proper quant of whichever model had the best stats with the updated template, and the bells and whistles. In the meantime, llmfan's quant is perfectly workable :)

llmfan46

11 days ago

llama.cpp, ollama and transformers still have issues with Gemma 4 models, I however redid all the GGUFs between yesterday and this morning with the newest chat template and tokenizer config and with the latest version of llama.cpp and with transformer downgraded to 5.5.0 due to issues with transformers version released on the 9th of this month.

SerialKicked

Owner 11 days ago

•

edited 11 days ago

llama.cpp, ollama and transformers still have issues with Gemma 4 models, I however redid all the GGUFs between yesterday and this morning with the newest chat template and tokenizer config and with the latest version of llama.cpp and with transformer downgraded to 5.5.0 due to issues with transformers version released on the 9th of this month.

Yeah, mate, I know. I know. It wasn't in any way shape or form an attack on your quant :D It's just that like 2 hours after you redid yours, google edited their chat completion jinja template, making the one in your file outdated. To be fair, llama.cpp team has contingency backend-level fixes for the old template. That's kinda why I'm waiting too. I don't have the pressure to deliver "on time", I'll just look at the different heretic builds, probably pick yours, manually check/edit the jinja, make my own imatrix, and make the gguf crediting you anyway.

edit: also ollama bad :D

llmfan46

11 days ago

•

edited 11 days ago

Oh you misunderstood, I wasn't feeling attacked or anything, I was giving an update that I actually re-redid the GGUFs for a third time, the GGUFs are not outdated, they are using the absolute latest version of chat_template.jinja and the absolute latest version of tokenizer_config.json and I re-uploaded them yet again just a few hours ago.

Check here: https://huggingface.co/llmfan46/gemma-4-31B-it-uncensored-heretic-GGUF/tree/main

Re-uploaded 12 hours ago.

SerialKicked

Owner 11 days ago

•

edited 11 days ago

Oh my bad! Sorry, I based that on the whole "changelog" thread launched by some rando we both interacted in, i assumed your update happened right there, so before the jinja update. Didn't check beyond that.

Hey fair enough then :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment