MXFP8?

#2
by eepos - opened

Thanks for the informative generation speed table!

Are you planning on doing an MXFP8 quant? I'm curious to see how it would do in speed (and quality) comparison.

I didn't upload mxfp8 because

  • It only works fast on blackwell GPU
  • fp8 model seems okay to me.

Here's Quick test of mxfp8 on 5090.

Model ErnieImage prepared for dynamic VRAM loading. 7834MB Staged. 0 patches attached.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:10<00:00,  1.93it/s] # fp8
Requested to load ErnieImage
Model ErnieImage prepared for dynamic VRAM loading. 8068MB Staged. 0 patches attached.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:10<00:00,  1.84it/s] # mxfp8
Requested to load ErnieImage
Model ErnieImage prepared for dynamic VRAM loading. 15322MB Staged. 0 patches attached.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:19<00:00,  1.05it/s] # bf16

mxfp8 is slightly bigger and slightly slower than fp8.

ERNIE-Image

ComfyUI_00224_

ERNIE-Image-Turbo

ComfyUI_00225_

Both mxfp8 and fp8 seems okay to me.

By the way, You can quantize mxfp8 using comfy-dit-quantizer. (It takes only ~1min.)

  • clone comfy-dit-quantizer.
  • copy configs/ernie-image-fp8.json and change float8_e4m3fn to mxfp8 in the file.
  • activate ComfyUI's venv.
  • python quantize.py configs/ernie-image-mxfp8.json <BF16 MODEL> <MXFP8 MODEL>

Thanks for the info. Looks like not much to gain with MXFP8 over FP8.

I tried to quantize earlier but I have ComfyUI portable, not a venv, and ran into some issues. Will try to troubleshoot with an LLM later.

eepos changed discussion status to closed

Sign up or log in to comment