Upload LiteRT Stable Diffusion v1.5 exports with Android/iOS deployment profiles

7b0cd98 verified about 1 month ago

1.92 kB

	# Stable Diffusion v1.5 converted to LiteRT

	This repository contains a LiteRT/TFLite export of the Hugging Face model `stable-diffusion-v1-5/stable-diffusion-v1-5`.

	## Base variants

	- `fp32/`: reference float export used by `android-gpu` and `ios-coreml`
	- `int8/`: mixed bundle with fp32 text encoder fallback, PT2E dynamic int8 UNet, and fp32 VAE fallback

	## Deployment profiles

	- `android-qnn-npu`: LiteRT Qualcomm AI Engine Direct (QNN) (android, preferred accelerator=NPU)
	- `android-gpu`: LiteRT GPU delegate (android, preferred accelerator=GPU)
	- `android-cpu`: LiteRT CPU/XNNPACK (android, preferred accelerator=CPU)
	- `ios-coreml`: LiteRT Core ML delegate (ios, preferred accelerator=CORE_ML)

	Profiles are emitted in `conversion_manifest.json` as manifest-level mappings onto the exported base variants. This avoids duplicating large model binaries while still letting each runtime pick backend-specific artifacts.

	## Files per exported base variant

	- `text_encoder.tflite`
	- `unet.tflite`
	- `vae_decoder.tflite`

	## Shared assets

	- `tokenizer/`
	- `scheduler/`
	- `configs/`
	- `configs/text_encoder_runtime_config.json`
	- `conversion_manifest.json`

	## Notes

	- Stable Diffusion v1.5 is a multi-stage pipeline, so this export is split into submodels.
	- The notebook first tries to export the text encoder with INT32 token ids for better GPU/Core ML delegate compatibility and records the actual exported input dtype per variant and per deployment profile.
	- The fp32 bundle is optional debug output; on CPU runtimes it is skipped by default to avoid kernel deaths during fp32 UNet conversion.
	- `android-qnn-npu` is a LiteRT/QNN-oriented deployment profile, not a Qualcomm AOT context binary.
	- Both exported base variants are smoke-tested by reloading the serialized LiteRT models and executing inference.
	- The preview images in `preview/` are decoder smoke tests, not final text-to-image samples.