128GB UMA Models
Collection
Models optimized for Strix Halo and similar systems • 4 items • Updated • 4
Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.
The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.
Depending on your TTM settings you should be between 100k and 200k ctx, or more if you disable vision.
This quant, build 8245 (2026/03/08)
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s |
|---|---|---|---|---|---|---|---|---|---|
| qwen35moe 122B.A10B Q4_1 | 94.79 GiB | 122.11 B | ROCm | 999 | 1024 | 1024 | 1 | pp2048 | 274.99 ± 0.00 |
| qwen35moe 122B.A10B Q4_1 | 94.79 GiB | 122.11 B | ROCm | 999 | 1024 | 1024 | 1 | tg256 | 16.62 ± 0.00 |
| qwen35moe 122B.A10B Q4_1 | 94.79 GiB | 122.11 B | ROCm | 999 | 1024 | 1024 | 1 | pp2048 @ d8192 | 238.78 ± 0.00 |
| qwen35moe 122B.A10B Q4_1 | 94.79 GiB | 122.11 B | ROCm | 999 | 1024 | 1024 | 1 | tg256 @ d8192 | 16.68 ± 0.00 |
Ignore displayed dtype, refer to the tensor types instead
See the GLM version for more details on theory and comparisons.
We're not able to determine the quantization variants.
Base model
Qwen/Qwen3.5-122B-A10B