Interest in 4bpw
#1
by 19440harry - opened
Hi, thanks for the EXL3 6bpw quant! Would it be possible to release lower bpw variants like 3bpw or 4bpw? The 6bpw at ~20GB spills heavily to RAM on 8GB VRAM setups (e.g. RTX 4070 Laptop), which negates most of ExLlamaV3's speed advantage. A 3-4bpw version would fit much better in VRAM and let users actually take advantage of EXL3's optimized Qwen3.5 kernels. Thanks!
19440harry changed discussion title from Interest in ExLlamaV2 version to Interest in 4bpw
3bpw pls