GGUF quants for Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated
I've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model. I've uploaded three quants:
iQ3_M โ should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant.
MXFP4_MOE โ a tight fit for systems with 32gb of ram plus a 16gb or more gpu. Or to fully load it in system ram, with cpu_moe, in systems with 64gb of ram
Q6K โ will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8
I didn't do a Q8. it could be a tight fit in systems with 64gb of ram and a 24gb vram gpu, but I have that system and it's freezing when I try to load it.
The q4_m file is older and slower than these new three quants, so I see no reason to use it instad of the mxfp4_moe
Enjoy!
license: apache-2.0 language: - en - zh base_model: - huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated pipeline_tag: text-generation tags: - abliterated - uncensored
- Downloads last month
- 648
3-bit
4-bit
6-bit
Model tree for juanml82/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated-gguf
Base model
Qwen/Qwen3-Next-80B-A3B-Thinking