soundsgoodai
/

GLM-4.5-Air-NVFP4-KV-cache-NVFP4

Text Generation

8-bit precision

Model card Files Files and versions

Model Description

A quantization setup used for GLM-4.5-Air:

Weights: NVFP4
KV cache: NVFP4
Tooling: NVIDIA/Model-Optimizer
Deploy with TensorRT-LLM

Downloads last month: 9

Safetensors

Model size

54B params

Tensor type

F32

·

BF16

·

F8_E4M3

·

U8

·

Model tree for soundsgoodai/GLM-4.5-Air-NVFP4-KV-cache-NVFP4

Base model

zai-org/GLM-4.5-Air

Quantized

(60)

this model