Exllamav3 quantization of Qwen/Qwen3-235B-A22B-Instruct-2507

2.25 bpw h6 63.377 GiB
3.00 bpw h6 83.800 GiB
4.00 bpw h6 111.013 GiB

  • The 2.25 bpw quant will fit in three 24 GB cards with 20k tokens of fp16 context.
  • The 4.00 bpw quant will fit in six 24 GB cards with 81,920 tokens of fp16 context.
  • Note that all these numbers are on the current version of exllamav3, which does not support tensor parallelism at this time. If you're reading this from a future where this feature has been implemented, or if you have larger cards, then you can probably do better than what I'm reporting here.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MikeRoz/Qwen3-235B-A22B-Instruct-2507-exl3

Quantized
(60)
this model