HaoweiShen
/

Gemma-4-31B-it-EXL3-6.0bpw

Model card Files Files and versions

Gemma-4-31B-it EXL3 6.0bpw

Quantized version of google/gemma-4-31B-it using ExLlamaV3 EXL3 format.

Property	Value
Original size	62 GB (BF16)
Quantized size	25 GB (6.0 bpw)
Format	EXL3 (QTIP-based)
Compression	2.5x

Requirements

ExLlamaV3 with Gemma 4 support: SimonShenhw/exllamav3-gemma4
GPU VRAM: 28+ GB

Credits

Architecture adaptation: @lesj0610
Inference fix + quantization: @SimonShenhw
ExLlamaV3: @turboderp

Downloads last month: 43

Safetensors

Model size

14B params

Tensor type

F16

·

I16

·

BF16

·

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HaoweiShen/Gemma-4-31B-it-EXL3-6.0bpw

Base model

google/gemma-4-31B-it

Quantized

(165)

this model