Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.

The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.

Depending on your TTM settings you should be between 100k and 200k ctx, or more if you disable vision.

This quant, build 8245 (2026/03/08)

model	size	params	backend	ngl	n_batch	n_ubatch	fa	test	t/s
qwen35moe 122B.A10B Q4_1	94.79 GiB	122.11 B	ROCm	999	1024	1024	1	pp2048	274.99 ± 0.00
qwen35moe 122B.A10B Q4_1	94.79 GiB	122.11 B	ROCm	999	1024	1024	1	tg256	16.62 ± 0.00
qwen35moe 122B.A10B Q4_1	94.79 GiB	122.11 B	ROCm	999	1024	1024	1	pp2048 @ d8192	238.78 ± 0.00
qwen35moe 122B.A10B Q4_1	94.79 GiB	122.11 B	ROCm	999	1024	1024	1	tg256 @ d8192	16.68 ± 0.00