[Q3KM] Seems to be functioning, would love to see more quants.

by AutisticPancake - opened 10 days ago

First and foremost, it is indeed ablated properly, thinking freely now - not refusing to generate.
I haven't tested it much yet, mostly in conversations / writing tasks.

As for quants, IQ4XS / Q4KM is probably what folks would want (or at least it's a personal preference). Regardless, thank you for making it!

Vlad100

9 days ago

IQ4XS please!

elpirater312

9 days ago

IQ4_NL and IQ2_M would be also nice to have, thanks.

Youssofal

Owner 9 days ago

First and foremost, it is indeed ablated properly, thinking freely now - not refusing to generate.

Im glad it’s working properly!

I’m uploading the Q2 and Q4KM quants today and experimenting with Imatrix quants to see which has the highest quality and will upload IQ4XS soon.

AutisticPancake

9 days ago

•

edited 9 days ago

More specific observations concerning both M2.7 variants (original / ablated), in context of LLM roleplay:
(doesn't concern model's quality - the purpose of this post is to provide feedback for curios folks out there)

Original M2.7 (without ablation) shows an inclination towards emotional swings, making characters more easily impressed, shocked, or outraged.
Ablated M2.7 (this model) tends to show characters more calm and collected, rarely YELLING OUT LOUD, which is honestly a shame - but it's making the ablated model respect the character's profile better.
Baseline model (non-ablated) can be forced into compliance too, providing a "punchy" alternative for RP scenarios that feature impulsive, short-tempered characters. To force the original M2.7 into compliance, set "Start Reply With" (in SillyTavern) to an appropriate faux-reasoning injection with both open/closed thinking tags and a jailbreak in between (ablated model, obviously, does not need this).

@Youssofal

Im glad it’s working properly!

I’m uploading the Q2 and Q4KM quants today and experimenting with Imatrix quants to see which has the highest quality and will upload IQ4XS soon.

Many thanks, I appreciate it!

Youssofal

Owner 9 days ago

•

edited 8 days ago

Just got Q4_KM Q2_K & Q6 up. Still working on the Imatrix varients which are being a bit finnicky.
Also uploading MLX varient, here is 3 BIT uploading the rest soon: https://huggingface.co/Youssofal/MiniMax-M2.7-Abliterated-Heretic-MLX-3bit

AutisticPancake

9 days ago

@Youssofal

Just got Q4_KM Q2_K & Q6 up. Still working on the Imatrix varients which are being a bit finnicky.
Also uploading MLX varients here: https://huggingface.co/Youssofal/MiniMax-M2.7-Abliterated-Heretic-MLX

Thank you! Waiting eagerly for that IQ4_XS!

On a side note, have you seen a recent discussion concerning M2.7 quants being erroneous? I'm not too knowledgeable about this matter, unfortunately - there's a Reddit thread explaining it -- https://old.reddit.com/r/LocalLLaMA/comments/1sk6l63/unsloth_minimaxm27gguf_in_broken_udq4_k_xl_avoid/ -- and other cases appearing elsewhere, see https://huggingface.co/AesSedai/MiniMax-M2.7-GGUF especially in corresponding discussion threads (pardon me for questioning your GGUFs over there; I feel cautious about this situation, mostly due to my own lack of experience).

Purely in regular conversations, your Q4K_M seems to be doing well so far!
As an example, here's a LOTR chapter (The Council of Elrond) summarized with it: https://text.is/Q4KM_test_M27

WetRat

8 days ago

•

edited 8 days ago

Hate to be "that guy" but I think something might be off...

It writes Chinese symbols randomly. Sometimes it omits spaces between words, and ~~I swear I saw it making weird mistakes... Like the user mentioning a girlfriend, and the model responding about a boyfriend. And it all happened using chat completion with .jinja.~~ In defense of those claiming its good, yes, it delivers good output often. ~~It's just volatile and I've no clue if abliteration did lobotomize it, or it was originally messed up.~~

EDIT: note on strike-through text - these specific issues might've been related to templates; the model is ultimately fine for what it is

IMO it needs:
a. Proper KLD/Perplexity evaluation.
b. Someone try the full weights to tell if this behavior manifests in non-quantized model.

WetRat

8 days ago

It writes Chinese symbols randomly.

Things are getting awkward... I downloaded bartowski/MiniMaxAI_MiniMax-M2.7-GGUF and it shows the same issue. I've yet to see any weird mistakes I mentioned before, however.
If more similarities will be found, I'll report it here.

Youssofal

Owner 8 days ago

It writes Chinese symbols randomly.

I will investigate this and see what I can do, ill run it through Perepxlexity + KLD. But this may be fundementally an issue with the model itself? M2.7 is just M2.5 trained in a recursive loop to train itself and possibly "benchmax" performance so it could be corruption from the fine tuning process.

ill test the VLLM version and see if i encouter it.

Youssofal

Owner 8 days ago

On a side note, have you seen a recent discussion concerning M2.7 quants being erroneous? I'm not too knowledgeable about this matter, unfortunately - there's a Reddit thread explaining it.

Yes, and I find it very interesting. M2.7 is essentially just M2.5 put through a recursive loop to train itself. I suspect they did this with the goal of maximising benchmark scores but it sacrificed the integrity and quality of the original model. In my flappy bird coding test M2.7 scored significantly worse than 2.5. So my theory is its on the fundamental training process itself, but I may be wrong and it just has to do with how we are ALL quantising the model incorrectly (lol). We will wait and see. If there are any significant developments I'll update the models here.

Purely in regular conversations, your Q4K_M seems to be doing well so far!
As an example, here's a LOTR chapter (The Council of Elrond) summarized with it: https://text.is/Q4KM_test_M27

Thats amazing! Im glad its working great for summarisation and RP and seems to have been unaffected.

WetRat

8 days ago

•

edited 8 days ago

@Youssofal

Don't mind the Chinese characters issue. It's M2.7's general flaw, manifesting in any form of this model.

I think it's usable. But is it reliable - a question with no clear answer. I've noticed it's not too willing to be thorough with longer System Prompt instructions, even if you put OOC notes in Post-History, demanding it to follow specific protocols - it may omit/ignore certain parts. Perhaps it would benefit from post-ablation fine tuning, which is certainly not easy to do and it's not something you should be concerning yourself with, especially given the fact it's a MoE model. Many "abliterated" models suffer from light damage to their abilities.

EDIT: note on strike-through text - these specific issues might've been related to templates; the model is ultimately fine for what it is

AutisticPancake

8 days ago

Thats amazing! Im glad its working great for summarisation and RP and seems to have been unaffected.

Yep, it's surprisingly good even at Q3KM.
Oh, and I see GGUF file #00001 of that quant was updated yesterday. Should I re-download it?

@WetRat
Stick with Chat Completion. I tried meddling with Text Completion and got even more weird crap than you described.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment