Model Description

A quantization setup used for GLM-4.5-Air:

  • Weights: NVFP4
  • KV cache: FP8
  • Tooling: NVIDIA/Model-Optimizer

Deploy with TensorRT-LLM

Downloads last month
7
Safetensors
Model size
54B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for soundsgoodai/GLM-4.5-Air-NVFP4-KV-cache-FP8

Quantized
(60)
this model