This is an unofficial quantized version of google/gemma-4-E2B-it.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM and SGLang, just like other GPTQModel-quantized models.

Downloads last month
1,692
Safetensors
Model size
5B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xingyu-Zheng/gemma-4-E2B-it-int4-foem

Quantized
(130)
this model

Dataset used to train Xingyu-Zheng/gemma-4-E2B-it-int4-foem

Collection including Xingyu-Zheng/gemma-4-E2B-it-int4-foem

Paper for Xingyu-Zheng/gemma-4-E2B-it-int4-foem