soundsgoodai
/

GLM-4.5-Air-NVFP4-KV-cache-FP8

Text Generation

8-bit precision

Model card Files Files and versions

Model Description

A quantization setup used for GLM-4.5-Air:

Weights: NVFP4
KV cache: FP8
Tooling: NVIDIA/Model-Optimizer

Deploy with TensorRT-LLM

Downloads last month: 7

Safetensors

Model size

54B params

Tensor type

F32

·

BF16

·

F8_E4M3

·

U8

·

Model tree for soundsgoodai/GLM-4.5-Air-NVFP4-KV-cache-FP8

Base model

zai-org/GLM-4.5-Air

Quantized

(60)

this model