Why do ppl hate fp8?

by putcn - opened Feb 26

Discussion

putcn

Feb 26

There is no one talking about this model

putcn changed discussion title from Why does ppl hate fp8? to Why do ppl hate fp8? Feb 26

WpythonW

Feb 27

•

edited Feb 27

The awkward size is the main issue. For FP8 you need either 2x RTX 4090 (48GB total) or 2x RTX 5090 (64GB total). Both expensive options for a model at this quality level.

INT4 weights make much more sense here: runs fine on 2x 16GB cards with no FP8 requirement, way wider hardware availability.

This model just falls into a dead zone. Too big for affordable single-GPU setups, too small to justify the hardware needed for FP8. Neither fish nor fowl.

Dampfinchen

Mar 2

•

edited Mar 2

Yeah, it would be amazing if Qwen could provide native 4 bit models instead. OpenAI trained GPT-oss at mxfp4 (quantization aware training) and it was a game changer for local inference, Google with Gemma 3 QAT too. FP8 models are still huge.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment