# Stable Diffusion v1.5 converted to LiteRT

This repository contains a LiteRT/TFLite export of the Hugging Face model `stable-diffusion-v1-5/stable-diffusion-v1-5`.

## Base variants

- `fp32/`: reference float export used by `android-gpu` and `ios-coreml`
- `int8/`: mixed bundle with fp32 text encoder fallback, PT2E dynamic int8 UNet, and fp32 VAE fallback

## Deployment profiles

- `android-qnn-npu`: LiteRT Qualcomm AI Engine Direct (QNN) (android, preferred accelerator=NPU)
- `android-gpu`: LiteRT GPU delegate (android, preferred accelerator=GPU)
- `android-cpu`: LiteRT CPU/XNNPACK (android, preferred accelerator=CPU)
- `ios-coreml`: LiteRT Core ML delegate (ios, preferred accelerator=CORE_ML)

Profiles are emitted in `conversion_manifest.json` as manifest-level mappings onto the exported base variants. This avoids duplicating large model binaries while still letting each runtime pick backend-specific artifacts.

## Files per exported base variant

- `text_encoder.tflite`
- `unet.tflite`
- `vae_decoder.tflite`

## Shared assets

- `tokenizer/`
- `scheduler/`
- `configs/`
- `configs/text_encoder_runtime_config.json`
- `conversion_manifest.json`

## Notes

- Stable Diffusion v1.5 is a multi-stage pipeline, so this export is split into submodels.
- The notebook first tries to export the text encoder with INT32 token ids for better GPU/Core ML delegate compatibility and records the actual exported input dtype per variant and per deployment profile.
- The fp32 bundle is optional debug output; on CPU runtimes it is skipped by default to avoid kernel deaths during fp32 UNet conversion.
- `android-qnn-npu` is a LiteRT/QNN-oriented deployment profile, not a Qualcomm AOT context binary.
- Both exported base variants are smoke-tested by reloading the serialized LiteRT models and executing inference.
- The preview images in `preview/` are decoder smoke tests, not final text-to-image samples.