One of these things is not like the other

by Cortex0833 - opened Feb 20

Feb 20

Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32

model	size	params	backend	ngl	threads	n_batch	fa	test	t/s
qwen3moe 235B.A22B Q4_0	93.68 GiB	178.31 B	ROCm	99	1	1024	1	pp4096	192.79 ± 0.68
qwen3moe 235B.A22B Q4_0	93.68 GiB	178.31 B	ROCm	99	1	1024	1	tg128	14.15 ± 0.00
qwen3moe 235B.A22B Q4_K - Medium	100.37 GiB	178.31 B	ROCm	99	1	1024	1	pp4096	173.61 ± 0.41
qwen3moe 235B.A22B Q4_K - Medium	100.37 GiB	178.31 B	ROCm	99	1	1024	1	tg128	13.28 ± 0.00
qwen3moe 235B.A22B Q4_K - Small	94.45 GiB	178.31 B	ROCm	99	1	1024	1	pp4096	181.91 ± 0.39
qwen3moe 235B.A22B Q4_K - Small	94.45 GiB	178.31 B	ROCm	99	1	1024	1	tg128	13.72 ± 0.00
qwen3moe 235B.A22B Q2_K - Medium	60.61 GiB	178.31 B	ROCm	99	1	1024	1	pp4096	144.62 ± 0.49
qwen3moe 235B.A22B Q2_K - Medium	60.61 GiB	178.31 B	ROCm	99	1	1024	1	tg128	18.15 ± 0.02
build: 8872ad212 (7966)

I just wanted to report that Q4_0 is genuinely fantastic on Strix Halo. MUCH faster, while retaining or slightly exceeding the intelligence of the Q3 I was previously running. I'm using it for an assistant, but it has to manage some tricky tags and tool calls.

Akicou

Owner Feb 20

some very interesting stats.. Thanks for sharing!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment