GGUF quants for Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated

I've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model. I've uploaded three quants:

iQ3_M โ€“ should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant.

MXFP4_MOE โ€“ a tight fit for systems with 32gb of ram plus a 16gb or more gpu. Or to fully load it in system ram, with cpu_moe, in systems with 64gb of ram

Q6K โ€“ will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8

I didn't do a Q8. it could be a tight fit in systems with 64gb of ram and a 24gb vram gpu, but I have that system and it's freezing when I try to load it.

The q4_m file is older and slower than these new three quants, so I see no reason to use it instad of the mxfp4_moe

Enjoy!


license: apache-2.0 language: - en - zh base_model: - huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated pipeline_tag: text-generation tags: - abliterated - uncensored

Downloads last month
648
GGUF
Model size
80B params
Architecture
qwen3next
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for juanml82/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated-gguf