IQ4_XS best quantized Qwen 3.5 27B for 16GB VRAM
#12
by tuanzxcv - opened
Great! Tks HauhauCS.
I've tried many Qwen 3.5 27B quantized versions GGUF both native and finetuned.
maybe, this is best quantized Qwen 3.5 27B for 16GB VRAM, "Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf"
Test llama.cpp server on RTX 4070 Ti SUPER :
text (IQ4_XS) + vision (Q8_0) + maximum context 53248 + kv cache q8_0
./llama-server -m models/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf --host 0.0.0.0 --port 8080 --temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 -t 1 -ctk q8_0 -ctv q8_0 -fa on -ngl all -np 1 --mmproj models/Qwen3.5-27B.mmproj-Q8_0.gguf --ctx-size 53248
test Flappy Bird : 3.802 tokens 1min 45s 35.88 t/s
test solar system (pure html css js, no library): 3.759 tokens 1min 44s 35.92 t/s



