Gemma 3 12B IT - NVFP4 Quantized

This is a FP4 (4-bit floating point) quantized version of google/gemma-3-12b-it, optimized for NVIDIA GPUs with native FP4 support (Blackwell architecture and newer).

Model Details

Attribute Value
Base Model google/gemma-3-12b-it
Quantization NVFP4 (NVIDIA 4-bit floating point)
Target Hardware NVIDIA Blackwell GPUs (B100, B200, GB200)
Original Parameters 12B

Description

This model provides pre-quantized NVFP4 weights for Gemma 3 12B Instruct, enabling efficient inference on NVIDIA's Blackwell architecture GPUs with native FP4 tensor core support. Loading pre-quantized weights avoids the overhead of runtime quantization.

Why NVFP4?

  • Native hardware support: Blackwell GPUs include dedicated FP4 tensor cores
  • ~4x memory reduction: Compared to FP16/BF16 weights
  • Faster inference: Leverages hardware-accelerated FP4 matrix operations
  • Pre-quantized: No quantization overhead at load time

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "yepthatsjason/gemma-3-12b-it-nvfp4"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
)

Hardware Requirements

  • Required: NVIDIA Blackwell GPU (B100, B200, GB200, or newer with FP4 support)
  • VRAM: ~6GB (significantly reduced from ~24GB for BF16)

Quantization Details

This model was quantized using NVIDIA's FP4 format, which uses 4 bits per weight with a floating-point representation optimized for neural network inference on Blackwell architecture.

Limitations

  • Requires Blackwell or newer NVIDIA GPUs with native FP4 support
  • May show slight accuracy degradation compared to full-precision model
  • Not compatible with older GPU architectures (Ampere, Hopper without FP4 emulation)

License

This model inherits the Gemma license from the base model.

Credits

Downloads last month
108
Safetensors
Model size
8B params
Tensor type
BF16
F32
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for yepthatsjason/gemma-3-12b-it-nvfp4

Quantized
(145)
this model