Update README with VAE integration and verified datasets

20523ee verified 8 days ago

4.47 kB

	# 🌊 LiquidDiffusion

	A novel attention-free image generation model based on Liquid Neural Networks

	## What is this?

	LiquidDiffusion is a first-of-its-kind image generation model that replaces attention with Parallel CfC (Closed-form Continuous-depth) blocks from Liquid Neural Network research. No existing paper combines LNNs with image generation — this fills that gap.

	### Key Properties
	- ✅ Zero attention layers — fully convolutional + liquid time-gating
	- ✅ Fully parallelizable — no ODE solvers, no sequential scanning, no recurrence
	- ✅ Latent space training — uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
	- ✅ Fits 16GB VRAM — tiny config runs 256px at batch=8 on T4 GPU
	- ✅ Simple training — Rectified Flow (MSE velocity prediction, no noise schedule)
	- ✅ 6 verified datasets — all tested and working with streaming support

	## Quick Start (Colab)

	1. Open `LiquidDiffusion_Training.ipynb` in Colab
	2. Select GPU runtime (T4)
	3. Pick a dataset from the dropdown (default: huggan/AFHQv2 — animal faces)
	4. Run all cells → training starts, samples generated every 500 steps

	## Architecture

	```
	Pixel Image (3×256×256)
	→ [Frozen SD-VAE Encode] → Latent (4×32×32)
	→ [LiquidDiffusion U-Net] → Velocity prediction (4×32×32)
	→ [Frozen SD-VAE Decode] → Generated Image (3×256×256)
	```

	Each LiquidDiffusionBlock contains:
	1. AdaLN — timestep conditioning via learned scale/shift
	2. ParallelCfCBlock — the core liquid neural network layer (CfC Eq.10)
	3. MultiScaleSpatialMix — 3×3+5×5+7×7 depthwise conv + global pooling (replaces attention)
	4. FeedForward — channel mixing via 1×1 conv

	### The ParallelCfC Block

	```python
	# CfC Eq.10 adapted for images:
	gate = σ(time_a(t_emb) · f(features) - time_b(t_emb)) # liquid time-gating
	out = gate · g(features) + (1 - gate) · h(features) # CfC interpolation
	α = exp(-λ · \|t_emb\|) # liquid relaxation
	output = α · input + (1 - α) · out # time-aware residual
	```

	## Verified Datasets

	All tested and working (with streaming support):

	\| Dataset \| Images \| Description \| Native Resolution \|
	\|---------\|--------\|-------------\|-------------------\|
	\| `huggan/AFHQv2` \| 16K \| Animal faces (cats, dogs, wildlife) \| 512×512 \|
	\| `nielsr/CelebA-faces` \| 202K \| Celebrity faces \| 178×218 \|
	\| `huggan/flowers-102-categories` \| 8K \| Flower photographs \| Variable \|
	\| `reach-vb/pokemon-blip-captions` \| 833 \| Pokemon illustrations \| 1280×1280 \|
	\| `huggan/anime-faces` \| 63K \| Anime faces \| 64×64 \|
	\| `Norod78/cartoon-blip-captions` \| ~3K \| Cartoon characters \| 512×512 \|

	## VAE

	Uses stabilityai/sd-vae-ft-mse (83.7M params, frozen during training):
	- 4 latent channels, 8× spatial downscale
	- PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
	- ~160MB VRAM in fp16
	- Scaling factor: 0.18215

	## Model Configs

	\| Config \| Params \| 256px VRAM (w/ VAE) \| 512px VRAM \|
	\|--------\|--------\|---------------------\|------------\|
	\| tiny \| ~23M \| ~6 GB \| ~12 GB \|
	\| small \| ~69M \| ~10 GB \| ~20 GB \|
	\| base \| ~154M \| ~16 GB \| ~30 GB \|

	## Training

	Objective: Rectified Flow — simple MSE on velocity
	```python
	x_t = (1 - t) · x0 + t · noise # linear interpolation
	v_target = noise - x0 # constant velocity
	loss = MSE(model(x_t, t), v_target) # that's it!
	```

	Sampling: Euler ODE integration, 25-50 steps

	## References

	\| Paper \| Contribution \|
	\|-------\|-------------\|
	\| [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) \| CfC Eq.10, parallelizable closed-form \|
	\| [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) \| Liquid time-constant ODE \|
	\| [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) \| Parallel liquid relaxation \|
	\| [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) \| U-Net + SSM for diffusion \|
	\| [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) \| SSM replaces attention in diffusion \|
	\| [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) \| Simple velocity training \|

	## Files

	```
	├── liquid_diffusion/
	│ ├── __init__.py
	│ ├── model.py # Full model architecture
	│ └── trainer.py # Trainer + dataset utilities
	├── LiquidDiffusion_Training.ipynb # Complete Colab notebook
	├── test_model.py
	└── README.md
	```

	## License
	MIT