Plans for 122B?

#4
by SerySmith - opened

Hey, just tested 9B, and it is great, I was hoping/wondering if you would do 122B eventually?

Also, is this ablit/heretic... or some other dataset finetune?
Thank you.

Owner

Hello,

Depending on how many people will actually make use of them, I may do 122B as well or stop at 35B and only do 4-9-27-35b models.

This is not Heretic. I use my own methodologies and datasets for prompts.

Please consider it.

I would use it @HauhauCS / perfect size for those of us rocking RTX 6000 Pro's or Sparks.

Owner

I am currently still working 16 hours a day on 35b-a3b. Well, more specifically I'm working on a specialized technique for maximizing the experts. Once 35b is out, I might do 122b at fp8 (so all quants up to that, no bf16).

I don't have the VRAM for bf16.

I am currently still working 16 hours a day on 35b-a3b. Well, more specifically I'm working on a specialized technique for maximizing the experts. Once 35b is out, I might do 122b at fp8 (so all quants up to that, no bf16).

I don't have the VRAM for bf16.

Thank you, I'm really looking forward to 35b.

great job, thank you and qwen team.

Hello everyone, I can confirm 122b is well underway now and I will hopefully be able to release one of the quality people have gotten used to with my releases SOON.

Trials take me about 10 minutes each so it's a tad bit slower :)

Sign up or log in to comment