One of these things is not like the other
#1
by Cortex0833 - opened
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | threads | n_batch | fa | test | t/s |
|---|---|---|---|---|---|---|---|---|---|
| qwen3moe 235B.A22B Q4_0 | 93.68 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | pp4096 | 192.79 ± 0.68 |
| qwen3moe 235B.A22B Q4_0 | 93.68 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | tg128 | 14.15 ± 0.00 |
| qwen3moe 235B.A22B Q4_K - Medium | 100.37 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | pp4096 | 173.61 ± 0.41 |
| qwen3moe 235B.A22B Q4_K - Medium | 100.37 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | tg128 | 13.28 ± 0.00 |
| qwen3moe 235B.A22B Q4_K - Small | 94.45 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | pp4096 | 181.91 ± 0.39 |
| qwen3moe 235B.A22B Q4_K - Small | 94.45 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | tg128 | 13.72 ± 0.00 |
| qwen3moe 235B.A22B Q2_K - Medium | 60.61 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | pp4096 | 144.62 ± 0.49 |
| qwen3moe 235B.A22B Q2_K - Medium | 60.61 GiB | 178.31 B | ROCm | 99 | 1 | 1024 | 1 | tg128 | 18.15 ± 0.02 |
| build: 8872ad212 (7966) |
I just wanted to report that Q4_0 is genuinely fantastic on Strix Halo. MUCH faster, while retaining or slightly exceeding the intelligence of the Q3 I was previously running. I'm using it for an assistant, but it has to manage some tricky tags and tool calls.
some very interesting stats.. Thanks for sharing!