Gemma-4-31B-it EXL3 6.0bpw
Quantized version of google/gemma-4-31B-it using ExLlamaV3 EXL3 format.
| Property | Value |
|---|---|
| Original size | 62 GB (BF16) |
| Quantized size | 25 GB (6.0 bpw) |
| Format | EXL3 (QTIP-based) |
| Compression | 2.5x |
Requirements
- ExLlamaV3 with Gemma 4 support: SimonShenhw/exllamav3-gemma4
- GPU VRAM: 28+ GB
Credits
- Architecture adaptation: @lesj0610
- Inference fix + quantization: @SimonShenhw
- ExLlamaV3: @turboderp
- Downloads last month
- 43
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for HaoweiShen/Gemma-4-31B-it-EXL3-6.0bpw
Base model
google/gemma-4-31B-it