One of these things is not like the other

#1
by Cortex0833 - opened

Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

model size params backend ngl threads n_batch fa test t/s
qwen3moe 235B.A22B Q4_0 93.68 GiB 178.31 B ROCm 99 1 1024 1 pp4096 192.79 ± 0.68
qwen3moe 235B.A22B Q4_0 93.68 GiB 178.31 B ROCm 99 1 1024 1 tg128 14.15 ± 0.00
qwen3moe 235B.A22B Q4_K - Medium 100.37 GiB 178.31 B ROCm 99 1 1024 1 pp4096 173.61 ± 0.41
qwen3moe 235B.A22B Q4_K - Medium 100.37 GiB 178.31 B ROCm 99 1 1024 1 tg128 13.28 ± 0.00
qwen3moe 235B.A22B Q4_K - Small 94.45 GiB 178.31 B ROCm 99 1 1024 1 pp4096 181.91 ± 0.39
qwen3moe 235B.A22B Q4_K - Small 94.45 GiB 178.31 B ROCm 99 1 1024 1 tg128 13.72 ± 0.00
qwen3moe 235B.A22B Q2_K - Medium 60.61 GiB 178.31 B ROCm 99 1 1024 1 pp4096 144.62 ± 0.49
qwen3moe 235B.A22B Q2_K - Medium 60.61 GiB 178.31 B ROCm 99 1 1024 1 tg128 18.15 ± 0.02
build: 8872ad212 (7966)

I just wanted to report that Q4_0 is genuinely fantastic on Strix Halo. MUCH faster, while retaining or slightly exceeding the intelligence of the Q3 I was previously running. I'm using it for an assistant, but it has to manage some tricky tags and tool calls.

Owner

some very interesting stats.. Thanks for sharing!

Sign up or log in to comment