gemma-4
Collection
2 items • Updated • 1
FP8 quantized version of google/gemma-4-E4B-it (8B params, edge model). Produced and maintained by vrfai.
This model was quantized using NVIDIA ModelOpt with the following configurations:
| Property | Value |
|---|---|
| Base model | google/gemma-4-E4B-it |
| Quant method | NVIDIA ModelOpt (FP8 E4M3 - num_bits: (4, 3)) |
| Weight scheme | Per-channel (axis: 0) |
| Input activation | Dynamic Per-token (type: dynamic) |
| Calibration algorithm | max |
| Size | 12 GB (vs 15 GB BF16) |
You can deploy this model using vLLM with the modelopt quantization backend. Please ensure you refer to the vLLM documentation for Gemma 4 for advanced serving options.
vllm serve vrfai/gemma-4-E4B-it-fp8 \
--quantization modelopt
Base model
google/gemma-4-E4B-it