Gemma-4-E4B-text-HQQ-INT4

INT4 weight-only quantization of google/gemma-4-E4B-it.

Text-only HQQ-INT4 extraction of Gemma 4 E4B-it. Vision and audio branches are removed; the language_model submodule is preserved 1:1 and quantized with Half-Quadratic Quantization. Fits 8 GB consumer GPUs.

Property Value
Base model google/gemma-4-E4B-it
Quantization INT4 weight-only
Approx. on-disk size ~9.2 GB
License Gemma Terms of Use
Languages English

Load (vLLM)

vllm serve drawais/Gemma-4-E4B-text-HQQ-INT4 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.94
from vllm import LLM, SamplingParams
llm = LLM(model="drawais/Gemma-4-E4B-text-HQQ-INT4", max_model_len=32768)
print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text)

Footprint

~9.2 GB on disk. Recommended VRAM: enough headroom for KV cache.

License & attribution

This artifact is a derivative work of google/gemma-4-E4B-it, released by its original authors under the Gemma Terms of Use.

This artifact is distributed under the same license. The full license text is included in LICENSE, and required attribution is in NOTICE.

License text: https://ai.google.dev/gemma/terms Source model: https://huggingface.co/google/gemma-4-E4B-it

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for drawais/Gemma-4-E4B-text-HQQ-INT4

Finetuned
(146)
this model