Why do ppl hate fp8?

#1
by putcn - opened

There is no one talking about this model

putcn changed discussion title from Why does ppl hate fp8? to Why do ppl hate fp8?

The awkward size is the main issue. For FP8 you need either 2x RTX 4090 (48GB total) or 2x RTX 5090 (64GB total). Both expensive options for a model at this quality level.

INT4 weights make much more sense here: runs fine on 2x 16GB cards with no FP8 requirement, way wider hardware availability.

This model just falls into a dead zone. Too big for affordable single-GPU setups, too small to justify the hardware needed for FP8. Neither fish nor fowl.

Yeah, it would be amazing if Qwen could provide native 4 bit models instead. OpenAI trained GPT-oss at mxfp4 (quantization aware training) and it was a game changer for local inference, Google with Gemma 3 QAT too. FP8 models are still huge.

Sign up or log in to comment