Gemma-4-E4B-text-HQQ-INT4
INT4 weight-only quantization of google/gemma-4-E4B-it.
Text-only HQQ-INT4 extraction of Gemma 4 E4B-it. Vision and audio branches are removed; the language_model submodule is preserved 1:1 and quantized with Half-Quadratic Quantization. Fits 8 GB consumer GPUs.
| Property | Value |
|---|---|
| Base model | google/gemma-4-E4B-it |
| Quantization | INT4 weight-only |
| Approx. on-disk size | ~9.2 GB |
| License | Gemma Terms of Use |
| Languages | English |
Load (vLLM)
vllm serve drawais/Gemma-4-E4B-text-HQQ-INT4 \
--max-model-len 32768 \
--gpu-memory-utilization 0.94
from vllm import LLM, SamplingParams
llm = LLM(model="drawais/Gemma-4-E4B-text-HQQ-INT4", max_model_len=32768)
print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text)
Footprint
~9.2 GB on disk. Recommended VRAM: enough headroom for KV cache.
License & attribution
This artifact is a derivative work of google/gemma-4-E4B-it,
released by its original authors under the Gemma Terms of Use.
This artifact is distributed under the same license. The full license text is
included in LICENSE, and required attribution is in NOTICE.
License text: https://ai.google.dev/gemma/terms Source model: https://huggingface.co/google/gemma-4-E4B-it
- Downloads last month
- 19