Q3 quantization performance issues
#7
by lingyezhixing - opened
The Q3-level quantization of 122B exhibited unexpectedly low quality, with a high probability of partial random garbled output and infinite repetition when encountering tasks like tool calls and Python code generation under recommended sampling parameters. Based on past experience, models with 100B parameters scales generally do not show such a significant drop in quality under Q3, so is this an issue inherent to the original model or is the Q3.5 architecture not suitable for low-bit quantization?
If you use the UD XL quantized files, they might have some issues, see https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF/discussions/5
~14t/s on RTX 3070 + 64GB RAM.
lingyezhixing changed discussion status to closed