ERNIE-Image-Turbo-MLX
Pre-converted MLX weights for baidu/ERNIE-Image-Turbo. Runs on Apple Silicon via mlx-ernie-image.
What's included
| File | Size | Component |
|---|---|---|
dit.npz |
16.1 GB | DiT (8B, 36 layers) โ pre-transposed for MLX |
vae.npz |
100 MB | FLUX.2 VAE decoder โ pre-transposed for MLX |
bn_stats.npz |
tiny | Batch norm running stats for latent denormalization |
config.json |
tiny | DiT architecture config |
Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime.
Usage
from ernie_image import ErnieImagePipeline, TextEncoder
te = TextEncoder.from_pretrained()
pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX")
emb = te.encode("A vibrant manga comic about a cat and a dragon")
img = pipe.generate(text_embeddings=emb)
img.save("output.png")
Benchmarks (1024x1024, 8 steps, M4 Pro 64GB)
MLX vs PyTorch/MPS
| Pipeline | Total | Per Step |
|---|---|---|
| PyTorch/MPS (diffusers) | 137.0s | 17.1s/step |
| MLX (this repo) | 134.2s | 16.0s/step |
Breakdown
| Component | Time |
|---|---|
| Text encode (PyTorch) | 0.1s |
| Denoise (MLX) | 128s |
| VAE decode (MLX) | 6s |
| Total | ~134s |
Code
github.com/treadon/mlx-ernie-image
Follow @treadon on X for more ML experiments
Base Model
baidu/ERNIE-Image-Turbo โ 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for treadon/ERNIE-Image-Turbo-MLX
Base model
baidu/ERNIE-Image-Turbo