⚠️ Warning

The 31B version may encounter an error at language_model.model.layers.5.self_attn.qkv_proj when deployed with vLLM.

This issue appears to originate from GPTQModel, as it does not occur in the E2B version. We are currently investigating and working on a fix.

This is an unofficial quantized version of google/gemma-4-31B-it.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Downloads last month
594
Safetensors
Model size
31B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xingyu-Zheng/gemma-4-31B-it-int8-foem

Quantized
(164)
this model

Dataset used to train Xingyu-Zheng/gemma-4-31B-it-int8-foem

Collection including Xingyu-Zheng/gemma-4-31B-it-int8-foem