File size: 4,471 Bytes
1a08b06 a9fae37 1a08b06 a9fae37 1a08b06 a9fae37 1a08b06 a9fae37 1a08b06 20523ee 1a08b06 20523ee a9fae37 20523ee a9fae37 20523ee fafdff9 1a08b06 fafdff9 a9fae37 20523ee fafdff9 a9fae37 20523ee a9fae37 20523ee a9fae37 fafdff9 20523ee a9fae37 20523ee a9fae37 1a08b06 fafdff9 20523ee fafdff9 20523ee a9fae37 20523ee 1a08b06 20523ee fafdff9 a9fae37 20523ee 1a08b06 a9fae37 1a08b06 20523ee 1a08b06 20523ee 1a08b06 a9fae37 1a08b06 a9fae37 1a08b06 20523ee 1a08b06 a9fae37 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | # π LiquidDiffusion
**A novel attention-free image generation model based on Liquid Neural Networks**
## What is this?
LiquidDiffusion is a **first-of-its-kind** image generation model that replaces attention with **Parallel CfC (Closed-form Continuous-depth) blocks** from Liquid Neural Network research. No existing paper combines LNNs with image generation β this fills that gap.
### Key Properties
- β
**Zero attention layers** β fully convolutional + liquid time-gating
- β
**Fully parallelizable** β no ODE solvers, no sequential scanning, no recurrence
- β
**Latent space training** β uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
- β
**Fits 16GB VRAM** β tiny config runs 256px at batch=8 on T4 GPU
- β
**Simple training** β Rectified Flow (MSE velocity prediction, no noise schedule)
- β
**6 verified datasets** β all tested and working with streaming support
## Quick Start (Colab)
1. Open `LiquidDiffusion_Training.ipynb` in Colab
2. Select GPU runtime (T4)
3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β animal faces)
4. Run all cells β training starts, samples generated every 500 steps
## Architecture
```
Pixel Image (3Γ256Γ256)
β [Frozen SD-VAE Encode] β Latent (4Γ32Γ32)
β [LiquidDiffusion U-Net] β Velocity prediction (4Γ32Γ32)
β [Frozen SD-VAE Decode] β Generated Image (3Γ256Γ256)
```
Each **LiquidDiffusionBlock** contains:
1. **AdaLN** β timestep conditioning via learned scale/shift
2. **ParallelCfCBlock** β the core liquid neural network layer (CfC Eq.10)
3. **MultiScaleSpatialMix** β 3Γ3+5Γ5+7Γ7 depthwise conv + global pooling (replaces attention)
4. **FeedForward** β channel mixing via 1Γ1 conv
### The ParallelCfC Block
```python
# CfC Eq.10 adapted for images:
gate = Ο(time_a(t_emb) Β· f(features) - time_b(t_emb)) # liquid time-gating
out = gate Β· g(features) + (1 - gate) Β· h(features) # CfC interpolation
Ξ± = exp(-Ξ» Β· |t_emb|) # liquid relaxation
output = Ξ± Β· input + (1 - Ξ±) Β· out # time-aware residual
```
## Verified Datasets
All tested and working (with streaming support):
| Dataset | Images | Description | Native Resolution |
|---------|--------|-------------|-------------------|
| `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512Γ512 |
| `nielsr/CelebA-faces` | 202K | Celebrity faces | 178Γ218 |
| `huggan/flowers-102-categories` | 8K | Flower photographs | Variable |
| `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280Γ1280 |
| `huggan/anime-faces` | 63K | Anime faces | 64Γ64 |
| `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512Γ512 |
## VAE
Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training):
- 4 latent channels, 8Γ spatial downscale
- PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
- ~160MB VRAM in fp16
- Scaling factor: 0.18215
## Model Configs
| Config | Params | 256px VRAM (w/ VAE) | 512px VRAM |
|--------|--------|---------------------|------------|
| tiny | ~23M | ~6 GB | ~12 GB |
| small | ~69M | ~10 GB | ~20 GB |
| base | ~154M | ~16 GB | ~30 GB |
## Training
**Objective**: Rectified Flow β simple MSE on velocity
```python
x_t = (1 - t) Β· x0 + t Β· noise # linear interpolation
v_target = noise - x0 # constant velocity
loss = MSE(model(x_t, t), v_target) # that's it!
```
**Sampling**: Euler ODE integration, 25-50 steps
## References
| Paper | Contribution |
|-------|-------------|
| [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
| [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE |
| [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
| [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
| [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion |
| [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |
## Files
```
βββ liquid_diffusion/
β βββ __init__.py
β βββ model.py # Full model architecture
β βββ trainer.py # Trainer + dataset utilities
βββ LiquidDiffusion_Training.ipynb # Complete Colab notebook
βββ test_model.py
βββ README.md
```
## License
MIT
|