MagicQuant for Apriel 1.6?

#1
by sebastienbo - opened

Hi,
Since Apriel 1.6 was hitting all the most important benchmarks, I was wondering if you could make a MagicQuant version of this one? According to the top 10 of best small opensource models that can run on laptops Apriel 1.6 and GLM 4.7 flash are the TOP of the intelligence line right now.
https://artificialanalysis.ai/models/open-source/small
The GLM 4.7 flash 30B requires double of the VRAM versus Apriel 1.6 (15B).

It would be great if we could have a magic quants version of both, but especially apriel 1.6, because that one is really slow in Q6 or Q8. While the Q4 is hallucinating to much in thinking mode. And Apriel 1.6 is the only one that can run on most consumer hardware without Super expensive GPU's. The only problem is speed vs quality. If we go Q4 it's unsuably bad for coding, while q6 is the best for that size, but extreemly slow at 4 tokens per second ...

Unsloth also published a new methodology that makes the process much faster and with less vram requirements:
https://unsloth.ai/docs/new/3x-faster-training-packing

Here is how you can start:
https://unsloth.ai/docs/basics/quantization-aware-training-qat

I would really appreciate a magic quantz ;-)
And you'll also probably will benefit of that one :-)

pls do this

seems like he's done

If it means anything I am still working on the project. Just busy with life and only able to put a few hours in here or there, but I got some good work on it over the weekend. I'm not using the old pipeline I built anymore. It takes a lot of time, there's flaws that're blatant to me, and I'm working very hard on version 2 which is built in a totally new framework and language.

But the original MagicQuant results truly were just my prototype. Plus, when the old pipeline runs, my entire PC is basically frozen until it's done and it can take days or weeks. So to be honest, I'm trying to make a proper code base that I can not only trust, have it be more perform ant, achieve better results, but also something I'd be comfortable releasing as an open source project. Because I don't really want to be the bottleneck of why people can't build MagicQuant models.

I am hoping to have a lot of the new Qwen3.5 and Gemma models made into the version 2 MagicQuant quantizations by the end of April or may. Then after hammering out the last of the details, I want to just release the code and let the community do with it as they want.

It makes me feel bad when people ask for models and I can't help. Because only my primary workstation can run the old pipeline and I can't have my PC hang for days or weeks when I have work to do. I work from home, so I need my main PC daily. All my previous MagicQuant models baked when I took leave.

take your time, it's just kinda been radio silent from huggingface but that's the only place i follow you

Thanks! On my GitHub here:
https://github.com/magiccodingman/MagicQuant-Wiki

I am trying to be a bit more active. Version 2 is taking a completely different direction. I learned a lot from version 1. And with KL Divergence now a benchmark, there's more nuance to how models are chosen. Previous models that were clear winners are not always clear winners anymore. Plus with a whole new philosophy of how to target and find the best quants has resulted in some really weird, but cool things. But I'm still sitting on it, digesting it, etc.

The hardest parts of my new framework are done. I'm now just tuning it and deciding on multiple factors now.

Sign up or log in to comment