Qwen3.5 GGUF Evaluation Results

pinned

by danielhanchen - opened Feb 20

Discussion

danielhanchen

Unsloth AI org Feb 20

•

edited Feb 24

Third party results conducted by Benjamin Marie:

"I tested Unsloth's UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well.
In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error increase).
You can cut on the order of ~500 GB of memory footprint while seeing little to no practical degradation (at least on the tasks I tried)."

Note the 3-bit is slightly higher accuracy than 4-bit due to a normal margin of error.

danielhanchen pinned discussion Feb 20

ThaiBinh

Feb 20

This comment has been hidden

jdchmiel

Mar 1

Has anyone done similar for the 122b, 35ba35, or 27b yet?

dzupin

Mar 8

I noticed that after the recent update, UD-TQ1_0 has been removed. Are you planning to upload it in the future?
Version UD-TQ1_0 is the only one that can be run on a system with 128 GB of unified RAM. I would really appreciate it if you could put it back (ideally with updated quantization)

tarruda

Mar 8

@dzupin there's a new UD-IQ2_XSS which should fit in 128G. I'm currently downloading it and will run some benchmarks to compare against ubergarm's smol-IQ2_XS: https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF/discussions/8

tarruda

Mar 8

Apparently the IQ2_M also fits in 128G, so going to test that one too

tarruda

Mar 8

@danielhanchen the chart in this thread is outdated right? I remember looking a while back and the IQ2_M was too big for 128G. Did you guys run another benchmark? I'm interested in the relative error increase for the new IQ2_M and IQ2_XXS

dzupin

Mar 9

My experience was that on my 128GB unified RAM only UD-TQ1_0 was able to run full 256K context without having to do any additional quantization of the context or shrinking of context size window. And I still had some spare RAM for my other apps to run.

tarruda

Mar 9

@dzupin are you on apple silicon? If so you can increase usable VRAM to 125G. I was able to fit ubergarm's smol-IQ2_XS with full 256k context and it worked great.

Note that I don't use this device for anything else, so it sits in pre-login state idling at around 2-3GB RAM.

x-polyglot-x

Mar 14

@dzupin are you on apple silicon? If so you can increase usable VRAM to 125G. I was able to fit ubergarm's smol-IQ2_XS with full 256k context and it worked great.

Note that I don't use this device for anything else, so it sits in pre-login state idling at around 2-3GB RAM.

Thanks for sharing. I've gone up to 120gb vram on my Mac Studio but always get a little nervous as I've locked it up before. 125gb is wild!

I am using the UD-IQ2-M one across two devices via RPC (Studio + 5090 rtx). Performance is very good -- around 170 t/s prompt processing and 20 t/s inference. Quality is excellent so far.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment