Qwen3.5 Unsloth GGUF Evaluation Results

#33

by danielhanchen - opened Feb 20

Discussion

danielhanchen

Qwen org Feb 20

•

edited Feb 24

Third party results conducted by Benjamin Marie:

Run the model locally via GGUFs here: https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF

"I tested Unsloth's UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well.
In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error increase).
You can cut on the order of ~500 GB of memory footprint while seeing little to no practical degradation (at least on the tasks I tried)."

Note the 3-bit is slightly higher accuracy than 4-bit due to a normal margin of error.

el-hadj

Feb 20

explain pls

danielhanchen

Qwen org Feb 20

explain pls

To run models locally you need GGUFs and need to quantize them down. Benjamin show's how Unsloth's GGUFs of Qwen3.5 perform very well and nearly match the model's full precision even at 3 or 4-bit.

asdf12234xx

Feb 20

action

xhejtman

Feb 20

Is there a way to run the quantized gguf in sglang? Documentation on unsloth seems outdated. In particular, I am very interested in mxfp4 quantization.

wahidmounir

Feb 22

What about TQ1_0 quantization performance

danielhanchen

Qwen org Feb 22

Is there a way to run the quantized gguf in sglang? Documentation on unsloth seems outdated. In particular, I am very interested in mxfp4 quantization.

I'm not sure if SG Lang supports newer GGUFs, probably not

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment