Quantized models of ERNIE-Image / ERNIE-Image-Turbo

nvfp4 (4.78GB)
fp8e4m3 (8.22GB)
int8rowwise (8.22GB) - it needs ComfyUI-INT8-Fast custom node

Generation Speed

ERINE-Image-Turbo

GPU	Quantization	Speed (it/s)	Time (secs)	vs BF16
RTX 5090	bf16	2.09	4.87	100%
	fp8e4m3	3.69	3.32	147%
	int8rowwise	4.31	3.05	160%
	nvfp4	5.09	2.72	179%
RTX 3090	bf16	0.88	12.42	100%
	fp8e4m3	0.84	12.73	98%
	int8rowwise	1.66	7.04	176%
	nvfp4	0.83	12.71	98%
RTX 3060	bf16	0.26	43.02	100%
	fp8e4m3	0.39	28.66	150%
	int8rowwise	0.82	14.43	298%
	nvfp4	0.39	28.72	150%

ERINE-Image

GPU	Quantization	Speed (it/s)	Time (secs)	vs BF16
RTX 5090	bf16	1.08	20.08	100%
	fp8e4m3	1.97	11.67	172%
	int8rowwise	2.14	10.89	184%
	nvfp4	2.56	9.35	215%
RTX 3090	bf16	0.40	53.33	100%
	fp8e4m3	0.39	54.71	97%
	int8rowwise	0.79	28.08	190%
	nvfp4	0.38	55.20	97%
RTX 3060	bf16	0.11	201.41	100%
	fp8e4m3	0.17	130.48	154%
	int8rowwise	0.35	62.42	323%
	nvfp4	0.17	130.87	154%

Sample

ERNIE-Image-Turbo

ERNIE-Image

How to reproduce

Use https://github.com/bedovyy/comfy-dit-quantizer with the below config json.

{
  "block_names": ["layers"],
  "rules": [
    { "policy": "keep", "match": ["adaLN", "self_attention.norm"] },
    { "policy": "float8_e4m3fn", "match": ["mlp", "self_attention.to"] }
  ]
}

Downloads last month: -

Model tree for Bedovyy/ERNIE-Image-Quantized

Base model

Comfy-Org/ERNIE-Image

Quantized

(1)

this model