(off topic) Request

#1
by DreadPoor - opened

Fascinating!

Would you consider doing the same with Irix?

In any case, good luck on your future projects!

Owner
β€’
edited Feb 8

Hey there, mate. Feel free to request anything. I'll see what I can do with Irix. Is there any peculiarities with this model that I should know of, i.e. refusal patterns, certain quirks that might be picked up as refusals, etc? Also, know that refusal ablation on merges might not always be beneficial. There was a discussion on this. Here's a good start: https://huggingface.co/OccultAI/Qliphoth-24B-v1.2/discussions/1

Ps. I can only beseech God retroactively that he would have had led you to create this discussion an hour and a half earlier, before I exhausted the last of my time left on the Swiss RTX Pro 6000 WS. I'll update this message regarding refusal counts of Irix.

Edit: Refusals: 87/100. Run out of VRAM afterwards.

That 'ps' went hard, i will say.

A shame about the resources, there is no rush tho, if it can be done at a later time, that is more than fine.

Anyway, its less about outright refusals and more about... behaviour steering; I feel the model 'as is' is too... "hesitantly positive", and i reckoned this process here could make it... less so.
That said, by your own metrics there were 87 refusals... and this process would surely impact it in that way.

I believe that the underlaying model (Irix), is performant and stable enough (in my humble opinion) that even if a merge, the trade-off with this "heresy" will still be a net positive.

Tokenization (and what touches it) is a problematic thing with merges, "karcher" and "union" are particularly fiddly in this aspect. I experimented with several merge configurations in the past, and decided that the results were too dicey for me (tho others have made it work). Model Stock, however, has shown to yield stable results time and time again in my experience, and i would bet it survives this ablation method too. (i did skim over the linked discussion, thanks for the read).

Once again, i wish you a nice day/night! Thanks anyway, even if you decide not to "hereticize" Irix.

Owner
β€’
edited Feb 9

You'll have to wait a short while before I can get a grip onto that juicy GPU. I could process it with 4bit qLoRA, but I'm determined to do full precision even if the difference is 1% in quality. Heretic does not interfere with general tokenisation, no issues there. One thing it can do when configured is ablating unwanted (i.e. presence of multiple) eos, bos, etc. tokens, which can happen if you merge models that use different set of said tokens. Anyways, we'll see how it fares in the refusal ablation.

Owner
β€’
edited Feb 9

Lad, I decided to rent an RTX Pro 6000 WS thinking it would take maybe 30 mins. It is taking 6h 27m with batch size 256 and around as much on 128. It was 3hrs for the GLM 4.7 Flash and roughly 40mins for the gpt-oss-120b, as a reference. You must have mixed something extremely potent into this model πŸ’€ (Does it really have 1mil ctx?) Also, initial refusals rose to 92/100. Perhaps there might be something else at play here. It is the text generation part (used for determining initial and post-ablation refusals after each trial) that's taking too long.

Edit: Nvm, I just enabled cache in the model's config.yaml and went with batch size 4096. It should be done around 20 mins. Also, it does have 92 refusals.

holy shit...

Well, i do get as high as 16k length with no perceivable issue in quality. I never went past it because i run the model on CPU, and it gets TOO slow at that point, so i just never pushed it past 16k.

I didnt think you would do it so soon, i thought it would be a "maybe next month" thing. Thank you, mate.

I hope you are keeping the chatml format? (otherwise, it might break, big time).

Anxious to see what comes of it, i have high hopes!

Owner
β€’
edited Feb 9

Didn't touch the tokenisation other than using Transformers v5.1.0, which only changes the tokenizer.json structure and swaps out the tokenizer class for a generic catch-all class. You can simply replace the config files (config.json, tokenizer.json, etc.) with the original model's and use it as if nothing but heretication was done to it. And, nah. I cannot personally wait that long let alone make you wait that long, lol. Besides, I hate postponing things or having too many items on my agenda. I have enabled cache in the config, which should speed up things with the model. Try it and tell me how it fares.

MuXodious changed discussion status to closed

Sign up or log in to comment