Qwen3-8B-NVFP4
NVFP4 quantized Qwen3-8B for NVIDIA Blackwell GPUs (RTX 5090, RTX PRO 4000).
Details
- Format: NVFP4 (4-bit FP) + FP8 KV cache
- Tools: TensorRT-LLM 1.2.0, ModelOpt 0.37.0
- Calibration: 512 samples, cnn_dailymail
Usage
huggingface-cli download glux-cz/Qwen3-8B-NVFP4-Blackwell --local-dir ./checkpoint
trtllm-build --checkpoint_dir ./checkpoint --output_dir ./engine --gemm_plugin nvfp4
- Downloads last month
- 5