⚠️ Warning

The 31B version may encounter an error at language_model.model.layers.5.self_attn.qkv_proj when deployed with vLLM.

This issue appears to originate from GPTQModel, as it does not occur in the E2B version. We are currently investigating and working on a fix.

This is an unofficial quantized version of google/gemma-4-31B-it.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Downloads last month: 594

Safetensors

Model size

31B params

Tensor type

BF16

I32

Model tree for Xingyu-Zheng/gemma-4-31B-it-int8-foem

Base model

google/gemma-4-31B-it

Quantized

(164)

this model

Dataset used to train Xingyu-Zheng/gemma-4-31B-it-int8-foem

Collection including Xingyu-Zheng/gemma-4-31B-it-int8-foem

FOEM Quantization

Collection

• 20 items • Updated about 12 hours ago • 1