Quantized models of ERNIE-Image / ERNIE-Image-Turbo

  • nvfp4 (4.78GB)
  • fp8e4m3 (8.22GB)
  • int8rowwise (8.22GB) - it needs ComfyUI-INT8-Fast custom node

Generation Speed

ERINE-Image-Turbo

GPU Quantization Speed (it/s) Time (secs) vs BF16
RTX 5090 bf16 2.09 4.87 100%
fp8e4m3 3.69 3.32 147%
int8rowwise 4.31 3.05 160%
nvfp4 5.09 2.72 179%
RTX 3090 bf16 0.88 12.42 100%
fp8e4m3 0.84 12.73 98%
int8rowwise 1.66 7.04 176%
nvfp4 0.83 12.71 98%
RTX 3060 bf16 0.26 43.02 100%
fp8e4m3 0.39 28.66 150%
int8rowwise 0.82 14.43 298%
nvfp4 0.39 28.72 150%

ERINE-Image

GPU Quantization Speed (it/s) Time (secs) vs BF16
RTX 5090 bf16 1.08 20.08 100%
fp8e4m3 1.97 11.67 172%
int8rowwise 2.14 10.89 184%
nvfp4 2.56 9.35 215%
RTX 3090 bf16 0.40 53.33 100%
fp8e4m3 0.39 54.71 97%
int8rowwise 0.79 28.08 190%
nvfp4 0.38 55.20 97%
RTX 3060 bf16 0.11 201.41 100%
fp8e4m3 0.17 130.48 154%
int8rowwise 0.35 62.42 323%
nvfp4 0.17 130.87 154%

Sample

ERNIE-Image-Turbo

ernie-image-turbo-comp1

ernie-image-turbo-comp2

ernie-image-turbo-comp3

erine-image-turbo-comp4

ERNIE-Image

ernie-image-comp1

ernie-image-comp2

ernie-image-comp3

erine-image-comp4

How to reproduce

Use https://github.com/bedovyy/comfy-dit-quantizer with the below config json.

{
  "block_names": ["layers"],
  "rules": [
    { "policy": "keep", "match": ["adaLN", "self_attention.norm"] },
    { "policy": "float8_e4m3fn", "match": ["mlp", "self_attention.to"] }
  ]
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bedovyy/ERNIE-Image-Quantized

Quantized
(1)
this model