I've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model. I've uploaded three quants:

iQ3_M – should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant.

MXFP4_MOE – should work for systems with 32gb of ram plus a 16gb or more gpu.

Q6K – will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8

Enjoy!

Downloads last month: 88

GGUF

Model size

80B params

Architecture

qwen3next

Hardware compatibility

3-bit

4-bit

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for juanml82/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated-gguf

Base model

Qwen/Qwen3-Next-80B-A3B-Instruct

Finetuned

huihui-ai/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated

Quantized

(7)

this model