gemma-4-E4B-it-fp8

FP8 quantized version of google/gemma-4-E4B-it (8B params, edge model). Produced and maintained by vrfai.

Quantization Details

This model was quantized using NVIDIA ModelOpt with the following configurations:

Property Value
Base model google/gemma-4-E4B-it
Quant method NVIDIA ModelOpt (FP8 E4M3 - num_bits: (4, 3))
Weight scheme Per-channel (axis: 0)
Input activation Dynamic Per-token (type: dynamic)
Calibration algorithm max
Size 12 GB (vs 15 GB BF16)

Usage

You can deploy this model using vLLM with the modelopt quantization backend. Please ensure you refer to the vLLM documentation for Gemma 4 for advanced serving options.

vllm serve vrfai/gemma-4-E4B-it-fp8 \
  --quantization modelopt
Downloads last month
299
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vrfai/gemma-4-E4B-it-fp8

Quantized
(101)
this model

Collection including vrfai/gemma-4-E4B-it-fp8