liquid-diffusion / README.md
krystv's picture
Update README with VAE integration and verified datasets
20523ee verified
|
raw
history blame
4.47 kB
# 🌊 LiquidDiffusion
**A novel attention-free image generation model based on Liquid Neural Networks**
## What is this?
LiquidDiffusion is a **first-of-its-kind** image generation model that replaces attention with **Parallel CfC (Closed-form Continuous-depth) blocks** from Liquid Neural Network research. No existing paper combines LNNs with image generation β€” this fills that gap.
### Key Properties
- βœ… **Zero attention layers** β€” fully convolutional + liquid time-gating
- βœ… **Fully parallelizable** β€” no ODE solvers, no sequential scanning, no recurrence
- βœ… **Latent space training** β€” uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
- βœ… **Fits 16GB VRAM** β€” tiny config runs 256px at batch=8 on T4 GPU
- βœ… **Simple training** β€” Rectified Flow (MSE velocity prediction, no noise schedule)
- βœ… **6 verified datasets** β€” all tested and working with streaming support
## Quick Start (Colab)
1. Open `LiquidDiffusion_Training.ipynb` in Colab
2. Select GPU runtime (T4)
3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β€” animal faces)
4. Run all cells β†’ training starts, samples generated every 500 steps
## Architecture
```
Pixel Image (3Γ—256Γ—256)
β†’ [Frozen SD-VAE Encode] β†’ Latent (4Γ—32Γ—32)
β†’ [LiquidDiffusion U-Net] β†’ Velocity prediction (4Γ—32Γ—32)
β†’ [Frozen SD-VAE Decode] β†’ Generated Image (3Γ—256Γ—256)
```
Each **LiquidDiffusionBlock** contains:
1. **AdaLN** β€” timestep conditioning via learned scale/shift
2. **ParallelCfCBlock** β€” the core liquid neural network layer (CfC Eq.10)
3. **MultiScaleSpatialMix** β€” 3Γ—3+5Γ—5+7Γ—7 depthwise conv + global pooling (replaces attention)
4. **FeedForward** β€” channel mixing via 1Γ—1 conv
### The ParallelCfC Block
```python
# CfC Eq.10 adapted for images:
gate = Οƒ(time_a(t_emb) Β· f(features) - time_b(t_emb)) # liquid time-gating
out = gate Β· g(features) + (1 - gate) Β· h(features) # CfC interpolation
Ξ± = exp(-Ξ» Β· |t_emb|) # liquid relaxation
output = Ξ± Β· input + (1 - Ξ±) Β· out # time-aware residual
```
## Verified Datasets
All tested and working (with streaming support):
| Dataset | Images | Description | Native Resolution |
|---------|--------|-------------|-------------------|
| `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512Γ—512 |
| `nielsr/CelebA-faces` | 202K | Celebrity faces | 178Γ—218 |
| `huggan/flowers-102-categories` | 8K | Flower photographs | Variable |
| `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280Γ—1280 |
| `huggan/anime-faces` | 63K | Anime faces | 64Γ—64 |
| `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512Γ—512 |
## VAE
Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training):
- 4 latent channels, 8Γ— spatial downscale
- PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
- ~160MB VRAM in fp16
- Scaling factor: 0.18215
## Model Configs
| Config | Params | 256px VRAM (w/ VAE) | 512px VRAM |
|--------|--------|---------------------|------------|
| tiny | ~23M | ~6 GB | ~12 GB |
| small | ~69M | ~10 GB | ~20 GB |
| base | ~154M | ~16 GB | ~30 GB |
## Training
**Objective**: Rectified Flow β€” simple MSE on velocity
```python
x_t = (1 - t) Β· x0 + t Β· noise # linear interpolation
v_target = noise - x0 # constant velocity
loss = MSE(model(x_t, t), v_target) # that's it!
```
**Sampling**: Euler ODE integration, 25-50 steps
## References
| Paper | Contribution |
|-------|-------------|
| [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
| [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE |
| [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
| [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
| [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion |
| [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |
## Files
```
β”œβ”€β”€ liquid_diffusion/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ model.py # Full model architecture
β”‚ └── trainer.py # Trainer + dataset utilities
β”œβ”€β”€ LiquidDiffusion_Training.ipynb # Complete Colab notebook
β”œβ”€β”€ test_model.py
└── README.md
```
## License
MIT