Quantization instruction
Hi there! Please, can you share how you make quantized model? It's the best one I've tried by huge margin. I would like to have ablititerated models as well, but has lost few days with no success previously with GLM 4.5 Air and don't know if my hardware is not enough or my actions was wrong. Anyway, thank you very much for great work.
Check my PR I made to llm compressor, it has example included. Link is in model card.
If you want I can create abliterated nvfp4
If you want I can create abliterated nvfp4
Wow! Since I don't know which one is better (https://huggingface.co/trohrbaugh/Qwen3.5-122B-A10B-heretic-v1 or https://huggingface.co/Chompa1422/Qwen3.5-122B-A10B-abliterated) I planned to investigate it myself. But I just can't reject your generous offer.
Meanwhile, can you tell me from your experience - is it possible at all to quantize 122B model with 192Gb of RAM and 4x3090 (96Gb VRAM)?
And the fact PR is yours is... well, just another level. Really appreciate your efforts, thank you!
I think it is enough for the 122B. I can't remember exactly how much ram it used. I quanted the 397B version with 1x pro 6000 (96GB) and it used somewhere between 800-900GB ram.
I think it is enough for the 122B. I can't remember exactly how much ram it used. I quanted the 397B version with 1x pro 6000 (96GB) and it used somewhere between 800-900GB ram.
ouch, so I need space for full bf16 weights to quantized it. Need to scrape two modules more.