| # π LiquidDiffusion |
|
|
| **A novel attention-free image generation model based on Liquid Neural Networks** |
|
|
| ## What is this? |
|
|
| LiquidDiffusion is a **first-of-its-kind** image generation model that replaces attention with **Parallel CfC (Closed-form Continuous-depth) blocks** from Liquid Neural Network research. No existing paper combines LNNs with image generation β this fills that gap. |
|
|
| ### Key Properties |
| - β
**Zero attention layers** β fully convolutional + liquid time-gating |
| - β
**Fully parallelizable** β no ODE solvers, no sequential scanning, no recurrence |
| - β
**Latent space training** β uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen) |
| - β
**Fits 16GB VRAM** β tiny config runs 256px at batch=8 on T4 GPU |
| - β
**Simple training** β Rectified Flow (MSE velocity prediction, no noise schedule) |
| - β
**6 verified datasets** β all tested and working with streaming support |
|
|
| ## Quick Start (Colab) |
|
|
| 1. Open `LiquidDiffusion_Training.ipynb` in Colab |
| 2. Select GPU runtime (T4) |
| 3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β animal faces) |
| 4. Run all cells β training starts, samples generated every 500 steps |
|
|
| ## Architecture |
|
|
| ``` |
| Pixel Image (3Γ256Γ256) |
| β [Frozen SD-VAE Encode] β Latent (4Γ32Γ32) |
| β [LiquidDiffusion U-Net] β Velocity prediction (4Γ32Γ32) |
| β [Frozen SD-VAE Decode] β Generated Image (3Γ256Γ256) |
| ``` |
|
|
| Each **LiquidDiffusionBlock** contains: |
| 1. **AdaLN** β timestep conditioning via learned scale/shift |
| 2. **ParallelCfCBlock** β the core liquid neural network layer (CfC Eq.10) |
| 3. **MultiScaleSpatialMix** β 3Γ3+5Γ5+7Γ7 depthwise conv + global pooling (replaces attention) |
| 4. **FeedForward** β channel mixing via 1Γ1 conv |
|
|
| ### The ParallelCfC Block |
|
|
| ```python |
| # CfC Eq.10 adapted for images: |
| gate = Ο(time_a(t_emb) Β· f(features) - time_b(t_emb)) # liquid time-gating |
| out = gate Β· g(features) + (1 - gate) Β· h(features) # CfC interpolation |
| Ξ± = exp(-Ξ» Β· |t_emb|) # liquid relaxation |
| output = Ξ± Β· input + (1 - Ξ±) Β· out # time-aware residual |
| ``` |
|
|
| ## Verified Datasets |
|
|
| All tested and working (with streaming support): |
|
|
| | Dataset | Images | Description | Native Resolution | |
| |---------|--------|-------------|-------------------| |
| | `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512Γ512 | |
| | `nielsr/CelebA-faces` | 202K | Celebrity faces | 178Γ218 | |
| | `huggan/flowers-102-categories` | 8K | Flower photographs | Variable | |
| | `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280Γ1280 | |
| | `huggan/anime-faces` | 63K | Anime faces | 64Γ64 | |
| | `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512Γ512 | |
|
|
| ## VAE |
|
|
| Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training): |
| - 4 latent channels, 8Γ spatial downscale |
| - PSNR 27.3 on LAION-Aesthetics (excellent reconstruction) |
| - ~160MB VRAM in fp16 |
| - Scaling factor: 0.18215 |
|
|
| ## Model Configs |
|
|
| | Config | Params | 256px VRAM (w/ VAE) | 512px VRAM | |
| |--------|--------|---------------------|------------| |
| | tiny | ~23M | ~6 GB | ~12 GB | |
| | small | ~69M | ~10 GB | ~20 GB | |
| | base | ~154M | ~16 GB | ~30 GB | |
|
|
| ## Training |
|
|
| **Objective**: Rectified Flow β simple MSE on velocity |
| ```python |
| x_t = (1 - t) Β· x0 + t Β· noise # linear interpolation |
| v_target = noise - x0 # constant velocity |
| loss = MSE(model(x_t, t), v_target) # that's it! |
| ``` |
|
|
| **Sampling**: Euler ODE integration, 25-50 steps |
|
|
| ## References |
|
|
| | Paper | Contribution | |
| |-------|-------------| |
| | [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form | |
| | [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE | |
| | [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation | |
| | [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion | |
| | [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion | |
| | [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training | |
|
|
| ## Files |
|
|
| ``` |
| βββ liquid_diffusion/ |
| β βββ __init__.py |
| β βββ model.py # Full model architecture |
| β βββ trainer.py # Trainer + dataset utilities |
| βββ LiquidDiffusion_Training.ipynb # Complete Colab notebook |
| βββ test_model.py |
| βββ README.md |
| ``` |
|
|
| ## License |
| MIT |
|
|