Quantization instruction

by nephepritou - opened Feb 28

Feb 28

Hi there! Please, can you share how you make quantized model? It's the best one I've tried by huge margin. I would like to have ablititerated models as well, but has lost few days with no success previously with GLM 4.5 Air and don't know if my hardware is not enough or my actions was wrong. Anyway, thank you very much for great work.

Sehyo

Owner Feb 28

Check my PR I made to llm compressor, it has example included. Link is in model card.

Sehyo

Owner Feb 28

If you want I can create abliterated nvfp4

nephepritou

Feb 28

If you want I can create abliterated nvfp4

Wow! Since I don't know which one is better (https://huggingface.co/trohrbaugh/Qwen3.5-122B-A10B-heretic-v1 or https://huggingface.co/Chompa1422/Qwen3.5-122B-A10B-abliterated) I planned to investigate it myself. But I just can't reject your generous offer.

Meanwhile, can you tell me from your experience - is it possible at all to quantize 122B model with 192Gb of RAM and 4x3090 (96Gb VRAM)?

And the fact PR is yours is... well, just another level. Really appreciate your efforts, thank you!

Sehyo

Owner Feb 28

I think it is enough for the 122B. I can't remember exactly how much ram it used. I quanted the 397B version with 1x pro 6000 (96GB) and it used somewhere between 800-900GB ram.

nephepritou

Mar 1

I think it is enough for the 122B. I can't remember exactly how much ram it used. I quanted the 397B version with 1x pro 6000 (96GB) and it used somewhere between 800-900GB ram.

ouch, so I need space for full bf16 weights to quantized it. Need to scrape two modules more.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment