ERNIE-Image-Turbo-MLX

Pre-converted MLX weights for baidu/ERNIE-Image-Turbo. Runs on Apple Silicon via mlx-ernie-image.

What's included

File Size Component
dit.npz 16.1 GB DiT (8B, 36 layers) โ€” pre-transposed for MLX
vae.npz 100 MB FLUX.2 VAE decoder โ€” pre-transposed for MLX
bn_stats.npz tiny Batch norm running stats for latent denormalization
config.json tiny DiT architecture config

Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime.

Usage

from ernie_image import ErnieImagePipeline, TextEncoder

te = TextEncoder.from_pretrained()
pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX")

emb = te.encode("A vibrant manga comic about a cat and a dragon")
img = pipe.generate(text_embeddings=emb)
img.save("output.png")

Benchmarks (1024x1024, 8 steps, M4 Pro 64GB)

MLX vs PyTorch/MPS

Pipeline Total Per Step
PyTorch/MPS (diffusers) 137.0s 17.1s/step
MLX (this repo) 134.2s 16.0s/step

Breakdown

Component Time
Text encode (PyTorch) 0.1s
Denoise (MLX) 128s
VAE decode (MLX) 6s
Total ~134s

Code

github.com/treadon/mlx-ernie-image

Follow @treadon on X for more ML experiments

Base Model

baidu/ERNIE-Image-Turbo โ€” 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for treadon/ERNIE-Image-Turbo-MLX

Finetuned
(1)
this model