Quantization instruction

#6
by nephepritou - opened

Hi there! Please, can you share how you make quantized model? It's the best one I've tried by huge margin. I would like to have ablititerated models as well, but has lost few days with no success previously with GLM 4.5 Air and don't know if my hardware is not enough or my actions was wrong. Anyway, thank you very much for great work.

Owner

Check my PR I made to llm compressor, it has example included. Link is in model card.

Owner

If you want I can create abliterated nvfp4

If you want I can create abliterated nvfp4

Wow! Since I don't know which one is better (https://huggingface.co/trohrbaugh/Qwen3.5-122B-A10B-heretic-v1 or https://huggingface.co/Chompa1422/Qwen3.5-122B-A10B-abliterated) I planned to investigate it myself. But I just can't reject your generous offer.

Meanwhile, can you tell me from your experience - is it possible at all to quantize 122B model with 192Gb of RAM and 4x3090 (96Gb VRAM)?

And the fact PR is yours is... well, just another level. Really appreciate your efforts, thank you!

Owner

I think it is enough for the 122B. I can't remember exactly how much ram it used. I quanted the 397B version with 1x pro 6000 (96GB) and it used somewhere between 800-900GB ram.

I think it is enough for the 122B. I can't remember exactly how much ram it used. I quanted the 397B version with 1x pro 6000 (96GB) and it used somewhere between 800-900GB ram.

ouch, so I need space for full bf16 weights to quantized it. Need to scrape two modules more.

Sign up or log in to comment