Z-Image-Turbo ONNX (Browser-Oriented Sharded Packaging)

Browser-oriented ONNX packaging of Tongyi-MAI/Z-Image-Turbo, an S3-DiT image generation model. This bundle keeps the tokenizer, text encoder, VAE, and scheduler in ONNX form and replaces the monolithic transformer with a sharded transformer layout intended for constrained browser and mobile WebGPU environments.

The packaging work here also leaned on the WebNN team's public Z-Image ONNX artifacts as a practical reference point for browser inference: webnn/Z-Image-Turbo.

What Changed

  • The WebNN-style transformer packaging was first reordered into a cleaner, more contiguous execution layout. In practice this acts like a "defragmented" transformer graph.
  • That reordered transformer was then split into multiple ONNX shards.
  • The goal was to reduce peak per-session model size for browser runtimes, especially on tighter mobile GPU memory budgets.

This is still the same underlying Z-Image-Turbo model family and ONNX-based browser inference stack. The main packaging differences are the reordered monolithic transformer and the sharded transformer layout built from it.

Repository Layout

  • tokenizer/: tokenizer files used to turn prompt text into model inputs
  • onnx/text_encoder_model_q4f16.onnx + .onnx_data: text encoder that converts tokenized prompts into conditioning embeddings
  • onnx/vae_decoder_model_f16.onnx: VAE decoder that turns final latents into pixels
  • onnx/vae_pre_process_model_f16.onnx: VAE helper used in the decode path
  • onnx/scheduler_step_model_f16.onnx: scheduler update model used between denoising steps to produce the next latent sample
  • onnx/transformer_model_q4f16.onnx + .onnx_data: reordered / defragmented monolithic transformer kept as the base form used to build the shards
  • onnx/transformer_model_q4f16_shard*.onnx + matching .onnx_data: the main transformer split from the reordered monolithic form into multiple shards to reduce peak session size
  • onnx/transformer_shards.json: manifest describing the shard set

Notes

  • Base model license and usage terms come from the upstream Tongyi-MAI/Z-Image-Turbo release.
  • This repo focuses on browser-friendly packaging rather than training or architecture changes.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cretz/Z-Image-Turbo-ONNX-sharded

Quantized
(38)
this model