I've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model. I've uploaded three quants:

iQ3_M – should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant.

MXFP4_MOE – a tight fit for systems with 32gb of ram plus a 16gb or more gpu. Or to fully load it in system ram, with cpu_moe, in systems with 64gb of ram

Q6K – will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8

I didn't do a Q8. it could be a tight fit in systems with 64gb of ram and a 24gb vram gpu, but I have that system and it's freezing when I try to load it.

The q4_m file is older and slower than these new three quants, so I see no reason to use it instad of the mxfp4_moe

Enjoy!

license: apache-2.0 language: - en - zh base_model: - huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated pipeline_tag: text-generation tags: - abliterated - uncensored

Downloads last month: 648

GGUF

Model size

80B params

Architecture

qwen3next

Hardware compatibility

3-bit

4-bit

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for juanml82/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated-gguf

Base model

Qwen/Qwen3-Next-80B-A3B-Thinking

Finetuned

huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated

Quantized

(7)

this model