MXFP8?

by eepos - opened 3 days ago

Discussion

eepos

3 days ago

Thanks for the informative generation speed table!

Are you planning on doing an MXFP8 quant? I'm curious to see how it would do in speed (and quality) comparison.

Bedovyy

Owner 2 days ago

•

edited 2 days ago

I didn't upload mxfp8 because

It only works fast on blackwell GPU
fp8 model seems okay to me.

Here's Quick test of mxfp8 on 5090.

Model ErnieImage prepared for dynamic VRAM loading. 7834MB Staged. 0 patches attached.
100%|██████████████████████████████████████| 20/20 [00:10<00:00,  1.93it/s] # fp8
Requested to load ErnieImage
Model ErnieImage prepared for dynamic VRAM loading. 8068MB Staged. 0 patches attached.
100%|██████████████████████████████████████| 20/20 [00:10<00:00,  1.84it/s] # mxfp8
Requested to load ErnieImage
Model ErnieImage prepared for dynamic VRAM loading. 15322MB Staged. 0 patches attached.
100%|██████████████████████████████████████| 20/20 [00:19<00:00,  1.05it/s] # bf16

mxfp8 is slightly bigger and slightly slower than fp8.

ERNIE-Image

ERNIE-Image-Turbo

Both mxfp8 and fp8 seems okay to me.

By the way, You can quantize mxfp8 using comfy-dit-quantizer. (It takes only ~1min.)

clone comfy-dit-quantizer.
copy configs/ernie-image-fp8.json and change float8_e4m3fn to mxfp8 in the file.
activate ComfyUI's venv.
python quantize.py configs/ernie-image-mxfp8.json <BF16 MODEL> <MXFP8 MODEL>

eepos

2 days ago

Thanks for the info. Looks like not much to gain with MXFP8 over FP8.

I tried to quantize earlier but I have ComfyUI portable, not a venv, and ran into some issues. Will try to troubleshoot with an LLM later.

eepos changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment