Q3 quantization performance issues

by lingyezhixing - opened Feb 26

Feb 26

The Q3-level quantization of 122B exhibited unexpectedly low quality, with a high probability of partial random garbled output and infinite repetition when encountering tasks like tool calls and Python code generation under recommended sampling parameters. Based on past experience, models with 100B parameters scales generally do not show such a significant drop in quality under Q3, so is this an issue inherent to the original model or is the Q3.5 architecture not suitable for low-bit quantization?

theo77186

Feb 26

If you use the UD XL quantized files, they might have some issues, see https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF/discussions/5

viiri2

Mar 20

~14t/s on RTX 3070 + 64GB RAM.

lingyezhixing changed discussion status to closed Mar 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment