turboderp's picture
Update README.md
9e5ea09 verified
metadata
license: apache-2.0
base_model: z-lab/gemma-4-31B-it-DFlash
base_model_relation: quantized
quantized_by: turboderp
tags:
  - exl3

EXL3 quants of gemma-4-31B-it-DFlash

2.50 bits per weight
3.00 bits per weight
3.50 bits per weight
4.00 bits per weight
5.00 bits per weight
6.00 bits per weight

Quant Mean acc. tokens¹
2.50 bpw 4.00
3.00 bpw 4.07
3.50 bpw 4.08
4.00 bpw 4.10
5.00 bpw 4.12
6.00 bpw 4.10
BF16 4.07

¹ Mean verified tokens per 15-token draft, CatBench at temp=0, using 4.00bpw target model on current exllamav3 dev branch (upcoming v0.0.33)