SearchingMan's picture
Upload LiteRT Stable Diffusion v1.5 exports with Android/iOS deployment profiles
7b0cd98 verified
# Stable Diffusion v1.5 converted to LiteRT
This repository contains a LiteRT/TFLite export of the Hugging Face model `stable-diffusion-v1-5/stable-diffusion-v1-5`.
## Base variants
- `fp32/`: reference float export used by `android-gpu` and `ios-coreml`
- `int8/`: mixed bundle with fp32 text encoder fallback, PT2E dynamic int8 UNet, and fp32 VAE fallback
## Deployment profiles
- `android-qnn-npu`: LiteRT Qualcomm AI Engine Direct (QNN) (android, preferred accelerator=NPU)
- `android-gpu`: LiteRT GPU delegate (android, preferred accelerator=GPU)
- `android-cpu`: LiteRT CPU/XNNPACK (android, preferred accelerator=CPU)
- `ios-coreml`: LiteRT Core ML delegate (ios, preferred accelerator=CORE_ML)
Profiles are emitted in `conversion_manifest.json` as manifest-level mappings onto the exported base variants. This avoids duplicating large model binaries while still letting each runtime pick backend-specific artifacts.
## Files per exported base variant
- `text_encoder.tflite`
- `unet.tflite`
- `vae_decoder.tflite`
## Shared assets
- `tokenizer/`
- `scheduler/`
- `configs/`
- `configs/text_encoder_runtime_config.json`
- `conversion_manifest.json`
## Notes
- Stable Diffusion v1.5 is a multi-stage pipeline, so this export is split into submodels.
- The notebook first tries to export the text encoder with INT32 token ids for better GPU/Core ML delegate compatibility and records the actual exported input dtype per variant and per deployment profile.
- The fp32 bundle is optional debug output; on CPU runtimes it is skipped by default to avoid kernel deaths during fp32 UNet conversion.
- `android-qnn-npu` is a LiteRT/QNN-oriented deployment profile, not a Qualcomm AOT context binary.
- Both exported base variants are smoke-tested by reloading the serialized LiteRT models and executing inference.
- The preview images in `preview/` are decoder smoke tests, not final text-to-image samples.