Qwen3.5 Unsloth GGUF Evaluation Results

#33
by danielhanchen - opened

Third party results conducted by Benjamin Marie:

Run the model locally via GGUFs here: https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF

"I tested Unsloth's UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well.
In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error increase).
You can cut on the order of ~500 GB of memory footprint while seeing little to no practical degradation (at least on the tasks I tried)."

Note the 3-bit is slightly higher accuracy than 4-bit due to a normal margin of error.

HB2gDSgWEAAF5Sr

explain pls

explain pls

To run models locally you need GGUFs and need to quantize them down. Benjamin show's how Unsloth's GGUFs of Qwen3.5 perform very well and nearly match the model's full precision even at 3 or 4-bit.

Is there a way to run the quantized gguf in sglang? Documentation on unsloth seems outdated. In particular, I am very interested in mxfp4 quantization.

What about TQ1_0 quantization performance

Is there a way to run the quantized gguf in sglang? Documentation on unsloth seems outdated. In particular, I am very interested in mxfp4 quantization.

I'm not sure if SG Lang supports newer GGUFs, probably not

Sign up or log in to comment