IQ4_XS best quantized Qwen 3.5 27B for 16GB VRAM

#12

by tuanzxcv - opened 22 days ago

Great! Tks HauhauCS.

I've tried many Qwen 3.5 27B quantized versions GGUF both native and finetuned.
maybe, this is best quantized Qwen 3.5 27B for 16GB VRAM, "Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf"

Test llama.cpp server on RTX 4070 Ti SUPER :
text (IQ4_XS) + vision (Q8_0) + maximum context 53248 + kv cache q8_0

./llama-server -m models/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf --host 0.0.0.0 --port 8080 --temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 -t 1 -ctk q8_0 -ctv q8_0 -fa on -ngl all -np 1 --mmproj models/Qwen3.5-27B.mmproj-Q8_0.gguf --ctx-size 53248

test Flappy Bird : 3.802 tokens 1min 45s 35.88 t/s

test solar system (pure html css js, no library): 3.759 tokens 1min 44s 35.92 t/s

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment