2.25 bpw h6 63.377 GiB
3.00 bpw h6 83.800 GiB
4.00 bpw h6 111.013 GiB

The 2.25 bpw quant will fit in three 24 GB cards with 20k tokens of fp16 context.
The 4.00 bpw quant will fit in six 24 GB cards with 81,920 tokens of fp16 context.
Note that all these numbers are on the current version of exllamav3, which does not support tensor parallelism at this time. If you're reading this from a future where this feature has been implemented, or if you have larger cards, then you can probably do better than what I'm reporting here.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for MikeRoz/Qwen3-235B-A22B-Instruct-2507-exl3

Base model

Quantized

(60)

this model