This is google/gemma-4-31B-it quantized with AutoRound to NVFP4. The model is compatible with vLLM (tested: v0.19). Tested with an RTX Pro 6000. Currently under evaluation.

Instructions

uv pip install vllm
uv pip install git+https://github.com/huggingface/transformers.git
vllm serve [this model ID]  --max-model-len 262144 --reasoning-parser gemma4
Downloads last month
103
Safetensors
Model size
2B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for kaitchup/gemma-4-31B-it-autoround-nvfp4-all

Quantized
(164)
this model