Qwen3-8B-NVFP4

NVFP4 quantized Qwen3-8B for NVIDIA Blackwell GPUs (RTX 5090, RTX PRO 4000).

Details

Format: NVFP4 (4-bit FP) + FP8 KV cache
Tools: TensorRT-LLM 1.2.0, ModelOpt 0.37.0
Calibration: 512 samples, cnn_dailymail

Usage

huggingface-cli download glux-cz/Qwen3-8B-NVFP4-Blackwell --local-dir ./checkpoint
trtllm-build --checkpoint_dir ./checkpoint --output_dir ./engine --gemm_plugin nvfp4

Downloads last month: 5

Model tree for glux-cz/Qwen3-8B-NVFP4-Blackwell

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1468)

this model