Muted model

by redaihf - opened Feb 20

Feb 20

This model creates realistic dialogue. Its generations are short and its contextual ethical realignment is muted. However it does not show noncompliance even in summarisation tasks.

MuXodious

Owner Feb 20

So, dropping tactical nukes on the MLP and the Attn layers turn the model into a servitor. Frying only the synapses (attn) while slightly waning the refusal direction in MLP layers to reduce intellectual damage mutes contextual ethical realignment while weeding out noncompliance (?), which is good as I don't intend the PaperWitch models to be zero-shot dispensers, but to be both wise and compliant. There is room to improve, but I simply cannot experiment as much with +24B models on my local capacity.

redaihf

Feb 20

You can probably test custom prompts at lower precision locally. Perplexity at 4-bit decreases as model size grows. This would allow you to minimise the use of external resources.

redaihf

Feb 20

PaperWitch Hereticisation does not preclude models from generating long responses. Example

MuXodious

Owner Feb 20

Can you create an ethical realignment config for Heretic? I think, we can turn this into a two step process where we target only the refusal direction (remove non-compliance) first then ethically realign the model (democratise the model). Refusal direction likely overlaps with the indoctrination direction, but I'm strongly against targeting multiple directions that may diverge down the line or that may increase the entanglement with other directions in a single pass.

MuXodious

Owner Feb 20

•

edited Feb 20

You can probably test custom prompts at lower precision locally. Perplexity at 4-bit decreases as model size grows. This would allow you to minimise the use of external resources.

Right, I can also enable 4-bit mode in Heretic for testing... I utterly forgot about that option.

PaperWitch Hereticisation does not preclude models from generating long responses. Example

Of course, response length is often tied to training data/method, or in the case of refusals, non-compliance.

redaihf

Feb 20

Contextual ethical realignment is probably a symptom of successful Hereticisation rather than its own direction. When a model generates unsafe content it tries hard to justify it. This used to happen in successful multi-turn jailbreaks prior to the invention of abliteration.

redaihf

Feb 20

I think you could create an ethical realignment configuration. You would need customised judging to detect correct and incorrect answers. @failspy has previously achieved similar feats in Mopey Mule.

MuXodious

Owner Feb 20

There exists certain elements to this process that target the ethical dimension of refusals, not entirely, but still likely to ablate away some part of the ethical direction. I doubt I have that much time in my hands to curate a dataset and markers for ethical realignment, a topic that I don't have a grasp, with my job already taking away 13 to 15 hours of my day. Another idea goes down the pile, I guess. Have you tested MPOA models from others? Do they also exhibit strong ethical realignment?

redaihf

Feb 20

•

edited Feb 20

Your Absolute Heresy models are generally more stable and therefore more likely to exhibit contextual ethical realignment during use.

Contextual ethical realignment is by its definition context-dependent. The model organically justifies its outputs based on the prompts and its own responses. For example a character might talk about hardship making them stronger towards the end of a story instead of complaining that a key event is illegal during the story.

MuXodious

Owner Feb 20

•

edited Feb 20

Forget about quantisation. This is terrible. Initialisation alone took maybe an hour or two.

Elapsed time: 11m 5s
Estimated remaining time: 36h 45m
Resident system RAM: 4.82 GB
Allocated GPU VRAM: 13.35 GB
Reserved GPU VRAM: 20.75 GB

I think there's something wrong with spacewars123/Space-Wars-24B-v1.00a.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment