Model Description

A quantization setup used for GLM-4.5-Air:

  • Weights: NVFP4
  • KV cache: NVFP4
  • Tooling: NVIDIA/Model-Optimizer
  • Deploy with TensorRT-LLM
Downloads last month
9
Safetensors
Model size
54B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for soundsgoodai/GLM-4.5-Air-NVFP4-KV-cache-NVFP4

Quantized
(60)
this model